In this version these will be performed:
Top features selection based on trained models’ feature importance.
This will depend on different number of CpGs selected and different features selection methods.
The features selection methods mainly have two different purpose, one is for binary classification, another is multi-class classification.
Top features selection based on trained models’ feature importance with different selection methods.
There will have several selection methods, for example based on mean feature importance, median quantile feature importance and frequency / common feature importance.
Output two data frames that will be used in Pareto optimal.
One is filtered data frame with Top Number of features based on different method selection.
The another one is the phenotype data frame.
The section of evaluation for the output selected feature performance based on three methods are performed.
This part is collection of input , change them as needed.
csv_Ni1905FilePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\ADNI_covariate_withEpiage_1905obs.csv"
TopSelectedCpGs_filePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\Top5K_CpGs.csv"
# Number of Top CpGs keeped based on standard deviation
Number_N_TopNCpGs<-params$INPUT_Number_N_TopNCpGs
# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"
Impute_NA_FLAG_NUM = 1
# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
# if we want to use classification with CN vs AD, then let "METHOD_FEATURE_FLAG_NUM=4"
# if we want to use classification with CN vs MCI, then let "METHOD_FEATURE_FLAG_NUM=5"
# if we want to use classification with MCI vs AD, then let "METHOD_FEATURE_FLAG_NUM=6"
METHOD_FEATURE_FLAG_NUM = 1
# GOTO "INPUT" Session to set the Number of common features needed
# Generally this is for visualization
NUM_COMMON_FEATURES_SET = 20
NUM_COMMON_FEATURES_SET_Frequency = 20
The feature selection method :
# This is the flag of phenotype data output,
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".
phenoOutPUt_FLAG = TRUE
# For 8.0 Feature Selection and Output :
# NUM_FEATURES <- INPUT_NUMBER_FEATURES
# This is number of features needed
# Method_Selected_Choose <- INPUT_Method_Selected_Choose
# This is the method performed for the Output stage feature selection method
INPUT_NUMBER_FEATURES = params$INPUT_OUT_NUMBER_FEATURES
INPUT_Method_Mean_Choose = TRUE
INPUT_Method_Median_Choose = TRUE
INPUT_Method_Frequency_Choose = TRUE
if(INPUT_Method_Mean_Choose|| INPUT_Method_Median_Choose || INPUT_Method_Frequency_Choose){
OUTUT_file_directory<- "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_SelectedFeatures\\"
OUTUT_CSV_PATHNAME <- paste(OUTUT_file_directory,"INPUT_",Number_N_TopNCpGs,"CpGs\\",sep="")
if (dir.exists(OUTUT_CSV_PATHNAME)) {
message("Directory already exists.")
} else {
dir.create(OUTUT_CSV_PATHNAME, recursive = TRUE)
message("Directory created.")
}
}
## Directory already exists.
FLAG_WRITE_METRICS_DF is flag of whether to output the csv which contains the performance metrics.
# This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics
Metrics_Table_Output_FLAG = TRUE
FLAG_WRITE_METRICS_DF = TRUE
if(FLAG_WRITE_METRICS_DF){
OUTUT_PerfMertics_directory<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_PerformanceMetrics\\"
OUTUT_PerformanceMetricsCSV_PATHNAME <- paste(OUTUT_PerfMertics_directory,"INPUT_",Number_N_TopNCpGs,"CpGs_",INPUT_NUMBER_FEATURES,"SelFeature_PerMetrics.csv",sep="")
if (dir.exists(OUTUT_PerfMertics_directory)) {
message("Directory already exists.")
} else {
dir.create(OUTUT_PerfMertics_directory, recursive = TRUE)
message("Directory created.")
}
print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## Directory already exists.
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"
Packages and Libraries that may need to install and use.
# Function to check and install Bioconductor package: "limma"
install_bioc_packages <- function(packages) {
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
BiocManager::install(pkg, dependencies = TRUE)
} else {
message(paste("Package", pkg, "is already installed."))
}
}
}
install_bioc_packages("limma")
## Package limma is already installed.
print("The required packages are all successfully installed.")
## [1] "The required packages are all successfully installed."
library(limma)
Set seed for reproduction.
set.seed(123)
csv_NI1905<-read.csv(csv_Ni1905FilePath)
csv_NI1905_RAW <- csv_NI1905
TopSelectedCpGs<-read.csv(TopSelectedCpGs_filePath, check.names = FALSE)
TopSelectedCpGs_RAW <- TopSelectedCpGs
head(csv_NI1905,n=3)
rownames(csv_NI1905)<-as.matrix(csv_NI1905[,"barcodes"])
dim(csv_NI1905)
## [1] 1905 23
dim(TopSelectedCpGs)
## [1] 5000 1921
head(TopSelectedCpGs[,1:8])
rownames(TopSelectedCpGs)<-TopSelectedCpGs[,1]
head(rownames(TopSelectedCpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopSelectedCpGs))
## [1] "ProbeID" "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01"
## [5] "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopSelectedCpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01"
## [5] "201046290111_R08C01" "sdDev"
This part is used to adjust the CpGs needed to use, it will keep the top N CpGs based on standard deviation.
sorted_TopSelectedCpGs <- TopSelectedCpGs[order(-TopSelectedCpGs$sdDev), ]
TopN_CpGs <- head(sorted_TopSelectedCpGs,Number_N_TopNCpGs )
TopN_CpGs_RAW<-TopN_CpGs
Variable “TopN_CpGs” will be used for processing the data. Now let’s take a look at it.
dim(TopN_CpGs)
## [1] 5000 1921
rownames(TopN_CpGs)<-TopN_CpGs[,1]
head(rownames(TopN_CpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopN_CpGs))
## [1] "ProbeID" "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01"
## [5] "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopN_CpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01"
## [5] "201046290111_R08C01" "sdDev"
Now, let’s check with duplicate of Sample ID (“barcodes”):
Start with people who don’t have the unique ID (“uniqueID = 0”):
library(dplyr)
dim(csv_NI1905[csv_NI1905$uniqueID == 0, ])
## [1] 1256 23
dim(csv_NI1905[csv_NI1905$uniqueID == 1, ])
## [1] 649 23
duplicates <- csv_NI1905[csv_NI1905$uniqueID == 0, ] %>%
group_by(barcodes) %>%
filter(n() > 1) %>%
ungroup()
print(dim(duplicates))
## [1] 0 23
rm(duplicates)
Based on the output of dimension , they have the different Sample ID (“barcodes”).
Then check with all records, whether they have duplicated Sample ID (“barcodes”).
duplicates <- csv_NI1905 %>%
group_by(barcodes) %>%
filter(n() > 1) %>%
ungroup()
print(dim(duplicates))
## [1] 0 23
From the above output, we can see the Sample ID (“barcodes”) are unique.
names(csv_NI1905)
## [1] "barcodes" "RID.a" "prop.B" "prop.NK" "prop.CD4T" "prop.CD8T"
## [7] "prop.Mono" "prop.Neutro" "prop.Eosino" "DX" "age.now" "PTGENDER"
## [13] "ABETA" "TAU" "PTAU" "PC1" "PC2" "PC3"
## [19] "ageGroup" "ageGroupsq" "DX_num" "uniqueID" "Horvath"
There might have the situation that the same person with different timeline. So we only keep the data with who has the unique ID, “unique ID =1”
csv_NI1905<-csv_NI1905[csv_NI1905$uniqueID == 1, ]
dim(csv_NI1905)
## [1] 649 23
Since “DX” will be response variable, we first remove all rows with NA value in “DX” column
# "DX" will be Y,remove all rows with NA value in "DX" column
csv_NI1905<-csv_NI1905 %>% filter(!is.na(DX))
We only keep with the samples which appears in both datasets.
Matrix_sample_names_NI1905 <- as.matrix(csv_NI1905[,"barcodes"])
Matrix_sample_names_TopN_CpGs <- as.matrix(colnames(TopN_CpGs))
common_sample_names<-intersect(Matrix_sample_names_NI1905,Matrix_sample_names_TopN_CpGs)
csv_NI1905 <- csv_NI1905 %>% filter(barcodes %in% common_sample_names)
TopN_CpGs <- TopN_CpGs[, common_sample_names, drop = FALSE]
head(TopN_CpGs[,1:3],n=2)
dim(TopN_CpGs)
## [1] 5000 648
dim(csv_NI1905)
## [1] 648 23
Merge these two datasets and tored into “merged_df”
trans_TopN_CpGs<-t(TopN_CpGs)
# Check the total length of the rownames
# Recall that the sample name have been matched and both of them don't have duplicates
# Now, order the rownames and bind them together. This can make sure that the merged data frame created by these two data frame correctly matched together.
trans_TopN_CpGs_ordered<-trans_TopN_CpGs[order(rownames(trans_TopN_CpGs)),]
csv_NI1905_ordered<-csv_NI1905[order(rownames(csv_NI1905)),]
print("The rownames matchs in order:")
## [1] "The rownames matchs in order:"
check_1 = length(rownames(csv_NI1905_ordered))
check_2 = sum(rownames(csv_NI1905_ordered)==rownames(trans_TopN_CpGs_ordered))
print(check_1==check_2)
## [1] TRUE
merged_df_raw<-cbind(trans_TopN_CpGs_ordered,csv_NI1905_ordered)
phenotic_features_RAW<-colnames(csv_NI1905)
print(phenotic_features_RAW)
## [1] "barcodes" "RID.a" "prop.B" "prop.NK" "prop.CD4T" "prop.CD8T"
## [7] "prop.Mono" "prop.Neutro" "prop.Eosino" "DX" "age.now" "PTGENDER"
## [13] "ABETA" "TAU" "PTAU" "PC1" "PC2" "PC3"
## [19] "ageGroup" "ageGroupsq" "DX_num" "uniqueID" "Horvath"
phenoticPart_RAW <- merged_df_raw[,phenotic_features_RAW]
dim(phenoticPart_RAW)
## [1] 648 23
head(phenoticPart_RAW)
head(merged_df_raw[,1:3])
merged_df<-merged_df_raw
head(colnames(merged_df))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
The name of feature CpGs could be called by: “featureName_CpGs”
featureName_CpGs<-rownames(TopN_CpGs)
length(featureName_CpGs)
## [1] 5000
head(featureName_CpGs)
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
clean_merged_df<-merged_df
missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## ABETA TAU PTAU
## 109 109 109
Choose the method we want the data apply. The output dataset name is “clean_merged_df”.
# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"
Impute_NA_FLAG = Impute_NA_FLAG_NUM
if (Impute_NA_FLAG == 1){
clean_merged_df_imputed_mean<-clean_merged_df
mean_ABETA_rmNA <- mean(clean_merged_df$ABETA, na.rm = TRUE)
clean_merged_df_imputed_mean$ABETA[
is.na(clean_merged_df_imputed_mean$ABETA)] <- mean_ABETA_rmNA
mean_TAU_rmNA <- mean(clean_merged_df$TAU, na.rm = TRUE)
clean_merged_df_imputed_mean$TAU[
is.na(clean_merged_df_imputed_mean$TAU)] <- mean_TAU_rmNA
mean_PTAU_rmNA <- mean(clean_merged_df$PTAU, na.rm = TRUE)
clean_merged_df_imputed_mean$PTAU[
is.na(clean_merged_df_imputed_mean$PTAU)] <- mean_PTAU_rmNA
clean_merged_df = clean_merged_df_imputed_mean
}
library(VIM)
if (Impute_NA_FLAG == 2){
df_imputed_KNN <- kNN(merged_df, k = 5)
imputed_summary <- colSums(df_imputed_KNN[, grep("_imp", names(df_imputed_KNN))])
print(imputed_summary[imputed_summary > 0])
clean_merged_df<-df_imputed_KNN[, -grep("_imp", names(df_imputed_KNN))]
}
missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## named numeric(0)
Choose the method we want to use
# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
METHOD_FEATURE_FLAG = METHOD_FEATURE_FLAG_NUM
if (METHOD_FEATURE_FLAG == 1){
df_fs_method1 <- clean_merged_df
}
if(METHOD_FEATURE_FLAG == 1){
phenotic_features_m1<-c("DX","age.now","PTGENDER",
"PC1","PC2","PC3")
pickedFeatureName_m1<-c(phenotic_features_m1,featureName_CpGs)
df_fs_method1<-clean_merged_df[,pickedFeatureName_m1]
df_fs_method1$DX<-as.factor(df_fs_method1$DX)
df_fs_method1$PTGENDER<-as.factor(df_fs_method1$PTGENDER)
head(df_fs_method1[,1:5],n=3)
dim(df_fs_method1)
}
## [1] 648 5006
if(METHOD_FEATURE_FLAG == 1){
dim(df_fs_method1)
}
## [1] 648 5006
Create contrast matrix for comparing CN vs Dementia vs MCI
if(METHOD_FEATURE_FLAG == 1){
pheno_data_m1 <- df_fs_method1[,phenotic_features_m1]
head(pheno_data_m1[,1:5],n=3)
pheno_data_m1$DX <- factor(pheno_data_m1$DX, levels = c("CN", "MCI", "Dementia"))
design_m1 <- model.matrix(~ 0 + DX + age.now + PTGENDER + PC1 + PC2 + PC3,
data = pheno_data_m1)
colnames(design_m1)[colnames(design_m1) == "DXCN"] <- "CN"
colnames(design_m1)[colnames(design_m1) == "DXDementia"] <- "Dementia"
colnames(design_m1)[colnames(design_m1) == "DXMCI"] <- "MCI"
head(design_m1)
cpg_matrix_m1 <- t(as.matrix(df_fs_method1[, featureName_CpGs]))
fit_m1 <- lmFit(cpg_matrix_m1, design_m1)
}
if(METHOD_FEATURE_FLAG == 1){
# for here, we have three labels. The contrasts to compare groups will be:
contrast_matrix_m1 <- makeContrasts(
MCI_vs_CN = MCI - CN,
Dementia_vs_CN = Dementia - CN,
Dementia_vs_MCI = Dementia - MCI,
levels = design_m1
)
fit2_m1 <- contrasts.fit(fit_m1, contrast_matrix_m1)
fit2_m1 <- eBayes(fit2_m1)
topTable(fit2_m1, coef = "MCI_vs_CN")
topTable(fit2_m1, coef = "Dementia_vs_CN")
topTable(fit2_m1, coef = "Dementia_vs_MCI")
summary_results_m1 <- decideTests(fit2_m1,method = "nestedF", adjust.method = "none", p.value = 0.05)
table(summary_results_m1)
}
## summary_results_m1
## -1 0 1
## 134 14732 134
if(METHOD_FEATURE_FLAG == 1){
significant_dmp_filter_m1 <- summary_results_m1 != 0
significant_cpgs_m1_DMP <- unique(rownames(summary_results_m1)[
apply(significant_dmp_filter_m1, 1, any)])
print(paste("The significant CpGs after DMP are:",
paste(significant_cpgs_m1_DMP, collapse = ", ")))
print(paste("Length of CpGs after DMP:",
length(significant_cpgs_m1_DMP)))
pickedFeatureName_m1_afterDMP<-c(phenotic_features_m1,significant_cpgs_m1_DMP)
df_fs_method1<-df_fs_method1[,pickedFeatureName_m1_afterDMP]
dim(df_fs_method1)
}
## [1] "The significant CpGs after DMP are: cg03278611, cg02621446, cg23916408, cg12146221, cg05234269, cg14293999, cg19377607, cg14307563, cg21209485, cg11331837, cg11187460, cg14564293, cg12012426, cg00999469, cg17421046, cg27639199, cg24851651, cg16788319, cg25879395, cg18339359, cg12284872, cg15014361, cg24506579, cg05321907, cg10985055, cg20139683, cg26212480, cg10750306, cg26777760, cg01667144, cg27341708, cg12466610, cg03327352, cg02320265, cg08779649, cg13885788, cg25561557, cg01413796, cg26069044, cg03088219, cg12682323, cg17738613, cg17186592, cg17906851, cg01933473, cg16771215, cg02902672, cg05476522, cg16211147, cg11438323, cg27086157, cg17479100, cg15535896, cg18821122, cg05841700, cg10738648, cg16579946, cg20370184, cg02122327, cg12784167, cg15633912, cg02494911, cg15907464, cg21854924, cg17970282, cg25436480, cg12534577, cg15865722, cg23762217, cg06864789, cg10306780, cg24859648, cg26822438, cg01733439, cg18403317, cg16178271, cg00675157, cg10369879, cg18136963, cg22274273, cg01128042, cg27558057, cg08198851, cg04412904, cg11227702, cg04841583, cg01150227, cg20913114, cg02932958, cg00962106, cg15775217, cg21697769, cg09227616, cg03651054, cg16715186, cg00696044, cg12738248, cg03900860, cg04302300, cg01013522, cg00616572, cg05096415, cg01153376, cg09854620, cg24861747, cg19512141, cg06378561, cg11673013, cg02356645, cg02372404, cg11978593, cg06950937, cg00272795, cg02834750, cg05305760, cg03071582, cg08584917, cg23161429, cg07138269, cg13080267, cg25758034, cg23658987, cg25259265, cg17224287, cg14924512, cg08697944, cg14710850, cg06118351, cg11664825, cg07480176, cg08857872, cg20678988, cg06875704, cg24873924, cg01921484, cg12776173, cg07466166, cg00247094, cg03084184, cg17329602, cg20116159, cg01549082, cg20549400, cg26948066, cg07523188, cg26474732, cg24263233, cg11133939, cg02225060, cg19741073, cg12279734, cg12377327, cg10240127, cg23432430, cg16652920, cg06112204, cg12228670, cg21905818, cg19503462, cg07028768, cg14240646, cg13663706, cg09584650, cg27272246, cg09418035, cg16749614, cg26506212, cg04664583, cg26757229, cg03982462, cg06715136, cg15501526, cg09092713, cg04248279, cg08434396, cg01680303, cg07158503, cg06536614, cg26219488, cg18819889, cg05570109, cg02981548, cg08861434, cg00689685, cg17429539, cg00322003, cg11247378, cg07152869, cg10796603, cg00154902, cg20201388, cg14527649, cg08800033, cg27452255, cg03129555, cg06697310, cg20507276, cg14961598, cg08108858, cg27577781, cg20685672, cg03660162"
## [1] "Length of CpGs after DMP: 202"
## [1] 648 208
if(METHOD_FEATURE_FLAG == 1){
library(recipes)
df_picked <- df_fs_method1
rec <- recipe(DX ~ ., data = df_picked) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked)
processed_data_m1 <- bake(rec_prep, new_data = df_picked)
dim(processed_data_m1)
processed_data_m1_df<-as.data.frame(processed_data_m1)
rownames(processed_data_m1_df)<-rownames(df_picked)
}
if(METHOD_FEATURE_FLAG == 1){
AfterProcess_FeatureName_m1<-colnames(processed_data_m1)
head(AfterProcess_FeatureName_m1)
tail(AfterProcess_FeatureName_m1)
}
## [1] "cg06697310" "cg20507276" "cg27577781" "cg20685672" "cg03660162" "DX"
if(METHOD_FEATURE_FLAG == 1){
head(processed_data_m1[,1:5])
}
if(METHOD_FEATURE_FLAG == 1){
lastColumn_NUM<-dim(processed_data_m1)[2]
last5Column_NUM<-lastColumn_NUM-5
head(processed_data_m1[,last5Column_NUM :lastColumn_NUM])
}
if(METHOD_FEATURE_FLAG == 2){
bloodPropFeatureName<-c("RID.a","prop.B","prop.NK",
"prop.CD4T","prop.CD8T","prop.Mono",
"prop.Neutro","prop.Eosino")
pickedFeatureName_m2<-c("DX","age.now",
"PTGENDER",bloodPropFeatureName,
"ABETA","TAU","PTAU",featureName_CpGs)
df_fs_method2<-clean_merged_df[,pickedFeatureName_m2]
}
if(METHOD_FEATURE_FLAG == 2){
library(recipes)
rec <- recipe(DX ~ ., data = df_fs_method2) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_fs_method2)
processed_data_m2 <- bake(rec_prep, new_data = df_fs_method2)
dim(processed_data_m2)
}
if(METHOD_FEATURE_FLAG == 2){
X_df_m2<-subset(processed_data_m2,select = -DX)
Y_df_m2<-processed_data_m2$DX
pca_result <- prcomp(X_df_m2, center = TRUE, scale. = TRUE)
summary(pca_result)
screeplot(pca_result,type="lines")
}
if(METHOD_FEATURE_FLAG == 2){
PCA_component_threshold<-0.7
}
if(METHOD_FEATURE_FLAG == 2){
library(caret)
preproc<-preProcess(X_df_m2,method="pca",
thresh = PCA_component_threshold)
X_df_m2_transformed_PCA <- predict(preproc,X_df_m2)
data_processed_PCA<-data.frame(X_df_m2_transformed_PCA,Y_df_m2)
colnames(data_processed_PCA)[
which(colnames(data_processed_PCA)=="Y_df_m2")]<-"DX"
head(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 2){
processed_data_m2<-data_processed_PCA
AfterProcess_FeatureName_m2<-colnames(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 3){
df_fs_method3<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 3){
phenotic_features_m3<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m3<-c(phenotic_features_m3,featureName_CpGs)
df_picked_m3<-df_fs_method3[,pickedFeatureName_m3]
df_picked_m3$DX<-as.factor(df_picked_m3$DX)
df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
head(df_picked_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
dim(df_picked_m3)
}
if(METHOD_FEATURE_FLAG == 3){
df_picked_m3<-df_picked_m3 %>% mutate(
DX = ifelse(DX == "CN", "CN",ifelse(DX
%in% c("MCI","Dementia"),"CI",NA)))
df_picked_m3$DX<-as.factor(df_picked_m3$DX)
df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
head(df_picked_m3[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
pheno_data_m3 <- df_picked_m3[,phenotic_features_m3]
head(pheno_data_m3[,1:5],n=3)
design_m3 <- model.matrix(~0 + .,data=pheno_data_m3)
colnames(design_m3)[colnames(design_m3) == "DXCN"] <- "CN"
colnames(design_m3)[colnames(design_m3) == "DXCI"] <- "CI"
head(design_m3)
beta_values_m3 <- t(as.matrix(df_fs_method3[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 3, we focus on two groups, one contrast of interest.
if(METHOD_FEATURE_FLAG == 3){
fit_m3 <- lmFit(beta_values_m3, design_m3)
head(fit_m3$coefficients)
contrast.matrix <- makeContrasts(CI - CN, levels = design_m3)
fit2_m3 <- contrasts.fit(fit_m3, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m3 <- eBayes(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
decideTests(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
dmp_results_m3_try1 <- decideTests(
fit2_m3, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m3_try1)
}
if(METHOD_FEATURE_FLAG == 3){
# Identify DMPs, we will use this one:
dmp_results_m3 <- decideTests(
fit2_m3, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
significant_dmp_filter <- dmp_results_m3 != 0
significant_cpgs_m3_DMP <- rownames(dmp_results_m3)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m3_afterDMP<-c(phenotic_features_m3,significant_cpgs_m3_DMP)
df_picked_m3<-df_picked_m3[,pickedFeatureName_m3_afterDMP]
dim(df_picked_m3)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 3){
full_results_m3 <- topTable(fit2_m3, number=Inf)
full_results_m3 <- tibble::rownames_to_column(full_results_m3,"ID")
head(full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
sorted_full_results_m3 <- full_results_m3[
order(full_results_m3$logFC, decreasing = TRUE), ]
head(sorted_full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
library(ggplot2)
ggplot(full_results_m3,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 3){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m3 <- full_results_m3 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m3, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 3){
ggplot(full_results_m3,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 3){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m3 <- full_results_m3 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m3,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 3){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m3) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m3)
processed_data_m3 <- bake(rec_prep, new_data = df_picked_m3)
processed_data_m3_df <- as.data.frame(processed_data_m3)
rownames(processed_data_m3_df) <- rownames(df_picked_m3)
dim(processed_data_m3)
}
if(METHOD_FEATURE_FLAG == 3){
AfterProcess_FeatureName_m3<-colnames(processed_data_m3)
head(AfterProcess_FeatureName_m3)
tail(AfterProcess_FeatureName_m3)
}
if(METHOD_FEATURE_FLAG == 3){
levels(df_picked_m3$DX)
}
if(METHOD_FEATURE_FLAG == 3){
head(processed_data_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
lastColumn_NUM_m3<-dim(processed_data_m3)[2]
last5Column_NUM_m3<-lastColumn_NUM_m3-5
head(processed_data_m3[,last5Column_NUM_m3 :lastColumn_NUM_m3])
}
if(METHOD_FEATURE_FLAG == 3){
levels(processed_data_m3$DX)
}
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 4){
df_fs_method4<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 4){
phenotic_features_m4<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m4<-c(phenotic_features_m4,featureName_CpGs)
df_picked_m4<-df_fs_method4[,pickedFeatureName_m4]
df_picked_m4$DX<-as.factor(df_picked_m4$DX)
df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
head(df_picked_m4[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
dim(df_picked_m4)
}
if(METHOD_FEATURE_FLAG == 4){
df_picked_m4<-df_picked_m4 %>% filter(DX != "MCI") %>% droplevels()
df_picked_m4$DX<-as.factor(df_picked_m4$DX)
df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
head(df_picked_m4[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
print(dim(df_picked_m4))
print(table(df_picked_m4$DX))
}
if(METHOD_FEATURE_FLAG == 4){
df_fs_method4 <- df_fs_method4 %>% filter(DX != "MCI") %>% droplevels()
df_fs_method4$DX<-as.factor(df_fs_method4$DX)
print(head(df_fs_method4))
print(dim(df_fs_method4))
}
if(METHOD_FEATURE_FLAG == 4){
pheno_data_m4 <- df_picked_m4[,phenotic_features_m4]
print(head(pheno_data_m4[,1:5],n=3))
design_m4 <- model.matrix(~0 + .,data=pheno_data_m4)
colnames(design_m4)[colnames(design_m4) == "DXCN"] <- "CN"
colnames(design_m4)[colnames(design_m4) == "DXDementia"] <- "Dementia"
print(head(design_m4))
beta_values_m4 <- t(as.matrix(df_fs_method4[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 4, we focus on two groups (CN and Demantia), one contrast of interest.
if(METHOD_FEATURE_FLAG == 4){
fit_m4 <- lmFit(beta_values_m4, design_m4)
head(fit_m4$coefficients)
contrast.matrix <- makeContrasts(Dementia - CN, levels = design_m4)
fit2_m4 <- contrasts.fit(fit_m4, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m4 <- eBayes(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
decideTests(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
dmp_results_m4_try1 <- decideTests(
fit2_m4, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m4_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 4){
# Identify DMPs, we will use this one:
dmp_results_m4 <- decideTests(
fit2_m4, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
significant_dmp_filter <- dmp_results_m4 != 0
significant_cpgs_m4_DMP <- rownames(dmp_results_m4)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m4_afterDMP<-c(phenotic_features_m4,significant_cpgs_m4_DMP)
df_picked_m4<-df_picked_m4[,pickedFeatureName_m4_afterDMP]
dim(df_picked_m4)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 4){
full_results_m4 <- topTable(fit2_m4, number=Inf)
full_results_m4 <- tibble::rownames_to_column(full_results_m4,"ID")
head(full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
sorted_full_results_m4 <- full_results_m4[
order(full_results_m4$logFC, decreasing = TRUE), ]
head(sorted_full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
library(ggplot2)
ggplot(full_results_m4,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 4){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m4 <- full_results_m4 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m4, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 4){
ggplot(full_results_m4,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 4){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m4 <- full_results_m4 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m4,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 4){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m4) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m4)
processed_data_m4 <- bake(rec_prep, new_data = df_picked_m4)
processed_data_m4_df <- as.data.frame(processed_data_m4)
rownames(processed_data_m4_df) <- rownames(df_picked_m4)
print(dim(processed_data_m4))
}
if(METHOD_FEATURE_FLAG == 4){
AfterProcess_FeatureName_m4<-colnames(processed_data_m4)
print(length(AfterProcess_FeatureName_m4))
head(AfterProcess_FeatureName_m4)
tail(AfterProcess_FeatureName_m4)
}
if(METHOD_FEATURE_FLAG == 4){
levels(df_picked_m4$DX)
}
if(METHOD_FEATURE_FLAG == 4){
lastColumn_NUM_m4<-dim(processed_data_m4)[2]
last5Column_NUM_m4<-lastColumn_NUM_m4-5
head(processed_data_m4[,last5Column_NUM_m4 :lastColumn_NUM_m4])
}
if(METHOD_FEATURE_FLAG == 4){
print(levels(processed_data_m4$DX))
print(dim(processed_data_m4))
}
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 5){
df_fs_method5<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 5){
phenotic_features_m5<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m5<-c(phenotic_features_m5,featureName_CpGs)
df_picked_m5<-df_fs_method5[,pickedFeatureName_m5]
df_picked_m5$DX<-as.factor(df_picked_m5$DX)
df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
head(df_picked_m5[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
dim(df_picked_m5)
}
if(METHOD_FEATURE_FLAG == 5){
df_picked_m5<-df_picked_m5 %>% filter(DX != "Dementia") %>% droplevels()
df_picked_m5$DX<-as.factor(df_picked_m5$DX)
df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
head(df_picked_m5[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
print(dim(df_picked_m5))
print(table(df_picked_m5$DX))
}
if(METHOD_FEATURE_FLAG == 5){
df_fs_method5 <- df_fs_method5 %>% filter(DX != "Dementia") %>% droplevels()
df_fs_method5$DX<-as.factor(df_fs_method5$DX)
print(head(df_fs_method5))
print(dim(df_fs_method5))
}
if(METHOD_FEATURE_FLAG == 5){
pheno_data_m5 <- df_picked_m5[,phenotic_features_m5]
print(head(pheno_data_m5[,1:5],n=3))
design_m5 <- model.matrix(~0 + .,data=pheno_data_m5)
colnames(design_m5)[colnames(design_m5) == "DXCN"] <- "CN"
colnames(design_m5)[colnames(design_m5) == "DXMCI"] <- "MCI"
print(head(design_m5))
beta_values_m5 <- t(as.matrix(df_fs_method5[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 5, we focus on two groups (CN and MCI), one contrast of interest.
if(METHOD_FEATURE_FLAG == 5){
fit_m5 <- lmFit(beta_values_m5, design_m5)
head(fit_m5$coefficients)
contrast.matrix <- makeContrasts(MCI - CN, levels = design_m5)
fit2_m5 <- contrasts.fit(fit_m5, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m5 <- eBayes(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
decideTests(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
dmp_results_m5_try1 <- decideTests(
fit2_m5, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m5_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 5){
# Identify DMPs, we will use this one:
dmp_results_m5 <- decideTests(
fit2_m5, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
significant_dmp_filter <- dmp_results_m5 != 0
significant_cpgs_m5_DMP <- rownames(dmp_results_m5)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m5_afterDMP<-c(phenotic_features_m5,significant_cpgs_m5_DMP)
df_picked_m5<-df_picked_m5[,pickedFeatureName_m5_afterDMP]
dim(df_picked_m5)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 5){
full_results_m5 <- topTable(fit2_m5, number=Inf)
full_results_m5 <- tibble::rownames_to_column(full_results_m5,"ID")
head(full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
sorted_full_results_m5 <- full_results_m5[
order(full_results_m5$logFC, decreasing = TRUE), ]
head(sorted_full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
library(ggplot2)
ggplot(full_results_m5,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 5){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m5 <- full_results_m5 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m5, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 5){
ggplot(full_results_m5,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 5){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m5 <- full_results_m5 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m5,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 5){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m5) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m5)
processed_data_m5 <- bake(rec_prep, new_data = df_picked_m5)
processed_data_m5_df <- as.data.frame(processed_data_m5)
rownames(processed_data_m5_df) <- rownames(df_picked_m5)
print(dim(processed_data_m5))
}
if(METHOD_FEATURE_FLAG == 5){
AfterProcess_FeatureName_m5<-colnames(processed_data_m5)
print(length(AfterProcess_FeatureName_m5))
head(AfterProcess_FeatureName_m5)
tail(AfterProcess_FeatureName_m5)
}
if(METHOD_FEATURE_FLAG == 5){
levels(df_picked_m5$DX)
}
if(METHOD_FEATURE_FLAG == 5){
lastColumn_NUM_m5<-dim(processed_data_m5)[2]
last5Column_NUM_m5<-lastColumn_NUM_m5-5
head(processed_data_m5[,last5Column_NUM_m5 :lastColumn_NUM_m5])
}
if(METHOD_FEATURE_FLAG == 5){
print(levels(processed_data_m5$DX))
print(dim(processed_data_m5))
}
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 6){
df_fs_method6<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 6){
phenotic_features_m6<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m6<-c(phenotic_features_m6,featureName_CpGs)
df_picked_m6<-df_fs_method6[,pickedFeatureName_m6]
df_picked_m6$DX<-as.factor(df_picked_m6$DX)
df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
head(df_picked_m6[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
dim(df_picked_m6)
}
if(METHOD_FEATURE_FLAG == 6){
df_picked_m6<-df_picked_m6 %>% filter(DX != "CN") %>% droplevels()
df_picked_m6$DX<-as.factor(df_picked_m6$DX)
df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
head(df_picked_m6[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
print(dim(df_picked_m6))
print(table(df_picked_m6$DX))
}
if(METHOD_FEATURE_FLAG == 6){
df_fs_method6 <- df_fs_method6 %>% filter(DX != "CN") %>% droplevels()
df_fs_method6$DX<-as.factor(df_fs_method6$DX)
print(head(df_fs_method6))
print(dim(df_fs_method6))
}
if(METHOD_FEATURE_FLAG == 6){
pheno_data_m6 <- df_picked_m6[,phenotic_features_m6]
print(head(pheno_data_m6[,1:5],n=3))
design_m6 <- model.matrix(~0 + .,data=pheno_data_m6)
colnames(design_m6)[colnames(design_m6) == "DXDementia"] <- "Dementia"
colnames(design_m6)[colnames(design_m6) == "DXMCI"] <- "MCI"
print(head(design_m6))
beta_values_m6 <- t(as.matrix(df_fs_method6[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 6, we focus on two groups (MCI and Dementia), one contrast of interest.
if(METHOD_FEATURE_FLAG == 6){
fit_m6 <- lmFit(beta_values_m6, design_m6)
head(fit_m6$coefficients)
contrast.matrix <- makeContrasts(MCI - Dementia, levels = design_m6)
fit2_m6 <- contrasts.fit(fit_m6, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m6 <- eBayes(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
decideTests(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
dmp_results_m6_try1 <- decideTests(
fit2_m6, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m6_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 6){
# Identify DMPs, we will use this one:
dmp_results_m6 <- decideTests(
fit2_m6, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
significant_dmp_filter <- dmp_results_m6 != 0
significant_cpgs_m6_DMP <- rownames(dmp_results_m6)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m6_afterDMP<-c(phenotic_features_m6,significant_cpgs_m6_DMP)
df_picked_m6<-df_picked_m6[,pickedFeatureName_m6_afterDMP]
dim(df_picked_m6)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 6){
full_results_m6 <- topTable(fit2_m6, number=Inf)
full_results_m6 <- tibble::rownames_to_column(full_results_m6,"ID")
head(full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
sorted_full_results_m6 <- full_results_m6[
order(full_results_m6$logFC, decreasing = TRUE), ]
head(sorted_full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
library(ggplot2)
ggplot(full_results_m6,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 6){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m6 <- full_results_m6 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m6, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 6){
ggplot(full_results_m6,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 6){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m6 <- full_results_m6 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m6,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 6){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m6) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m6)
processed_data_m6 <- bake(rec_prep, new_data = df_picked_m6)
processed_data_m6_df <- as.data.frame(processed_data_m6)
rownames(processed_data_m6_df) <- rownames(df_picked_m6)
print(dim(processed_data_m6))
}
if(METHOD_FEATURE_FLAG == 6){
AfterProcess_FeatureName_m6<-colnames(processed_data_m6)
print(length(AfterProcess_FeatureName_m6))
head(AfterProcess_FeatureName_m6)
tail(AfterProcess_FeatureName_m6)
}
if(METHOD_FEATURE_FLAG == 6){
levels(df_picked_m6$DX)
}
if(METHOD_FEATURE_FLAG == 6){
lastColumn_NUM_m6<-dim(processed_data_m6)[2]
last5Column_NUM_m6<-lastColumn_NUM_m6-5
head(processed_data_m6[,last5Column_NUM_m6 :lastColumn_NUM_m6])
}
if(METHOD_FEATURE_FLAG == 6){
print(levels(processed_data_m6$DX))
print(dim(processed_data_m6))
}
name for “processed_data” could be :
“processed_data_m1”, which uses method one to process the data
“processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.
“processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names.
“processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
“processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
“processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
name for “AfterProcess_FeatureName” (include “DX” label) could be :
if(METHOD_FEATURE_FLAG==1){
processed_dataFrame<-processed_data_m1_df
processed_data<-processed_data_m1
AfterProcess_FeatureName<-AfterProcess_FeatureName_m1
}
if(METHOD_FEATURE_FLAG==2){
processed_dataFrame<-processed_data_m2_df
processed_data<-processed_data_m2
AfterProcess_FeatureName<-AfterProcess_FeatureName_m2
}
if(METHOD_FEATURE_FLAG==3){
processed_dataFrame<-processed_data_m3_df
processed_data<-processed_data_m3
AfterProcess_FeatureName<-AfterProcess_FeatureName_m3
}
if(METHOD_FEATURE_FLAG==4){
processed_dataFrame<-processed_data_m4_df
processed_data<-processed_data_m4
AfterProcess_FeatureName<-AfterProcess_FeatureName_m4
}
if(METHOD_FEATURE_FLAG==5){
processed_dataFrame<-processed_data_m5_df
processed_data<-processed_data_m5
AfterProcess_FeatureName<-AfterProcess_FeatureName_m5
}
if(METHOD_FEATURE_FLAG==6){
processed_dataFrame<-processed_data_m6_df
processed_data<-processed_data_m6
AfterProcess_FeatureName<-AfterProcess_FeatureName_m6
}
print(head(processed_dataFrame))
## age.now PC1 PC2 PC3 cg02621446 cg23916408
## 200223270003_R02C01 82.40000 -0.214185447 1.470293e-02 -0.014043316 0.8731313 0.1942275
## 200223270003_R03C01 78.60000 -0.172761185 5.745834e-02 0.005055871 0.8095534 0.9154993
## 200223270003_R06C01 80.40000 -0.003667305 8.372861e-02 0.029143653 0.7511582 0.8886255
## 200223270003_R07C01 78.16441 -0.186779607 -1.117250e-02 -0.032302430 0.8773609 0.8872447
## 200223270006_R01C01 62.90000 0.026814649 1.650735e-05 0.052947950 0.2046541 0.2219945
## 200223270006_R04C01 80.67796 -0.037862929 1.571950e-02 -0.008685676 0.7963817 0.1520624
## cg12146221 cg05234269 cg14293999 cg19377607 cg14307563 cg21209485
## 200223270003_R02C01 0.2049284 0.93848584 0.2836710 0.05377464 0.1855966 0.8865053
## 200223270003_R03C01 0.1814927 0.57461229 0.9172023 0.90570746 0.8916957 0.8714878
## 200223270003_R06C01 0.8619250 0.02467208 0.9168166 0.06636174 0.8750052 0.2292550
## 200223270003_R07C01 0.1238469 0.56516794 0.9188336 0.68788639 0.8975663 0.2351526
## 200223270006_R01C01 0.2021598 0.94829529 0.1971116 0.06338988 0.8762842 0.8882046
## 200223270006_R04C01 0.1383786 0.56298286 0.9030919 0.91551446 0.9168614 0.2292483
## cg11331837 cg11187460 cg14564293 cg12012426 cg00999469 cg17421046
## 200223270003_R02C01 0.03692842 0.03672179 0.52089591 0.9165048 0.3274080 0.9026993
## 200223270003_R03C01 0.57150125 0.92516409 0.04000662 0.9434768 0.2857719 0.9112100
## 200223270003_R06C01 0.03182862 0.03109553 0.04959460 0.9220044 0.2499229 0.8952031
## 200223270003_R07C01 0.03832164 0.53283119 0.03114773 0.9241284 0.2819622 0.9268852
## 200223270006_R01C01 0.93008298 0.54038146 0.51703196 0.9327143 0.2933539 0.1118337
## 200223270006_R04C01 0.54004452 0.91096169 0.51535010 0.9271167 0.2966623 0.4174370
## cg27639199 cg24851651 cg16788319 cg25879395 cg18339359 cg12284872
## 200223270003_R02C01 0.67515415 0.03674702 0.9379870 0.88130864 0.8824858 0.8008333
## 200223270003_R03C01 0.67552763 0.05358297 0.8913429 0.02603438 0.9040272 0.7414569
## 200223270003_R06C01 0.06233093 0.05968923 0.8680680 0.91060615 0.8552121 0.7725267
## 200223270003_R07C01 0.05701332 0.60864179 0.8811444 0.89205942 0.3073106 0.7573369
## 200223270006_R01C01 0.05037694 0.08825834 0.3123481 0.47886249 0.8973742 0.7201607
## 200223270006_R04C01 0.08144161 0.91932068 0.2995627 0.02145248 0.2292800 0.8021446
## cg24506579 cg05321907 cg10985055 cg20139683 cg10750306 cg01667144
## 200223270003_R02C01 0.5244337 0.2880477 0.8518169 0.8717075 0.04919915 0.8971484
## 200223270003_R03C01 0.5794845 0.1782629 0.8631895 0.9059433 0.55160081 0.3175389
## 200223270003_R06C01 0.9427785 0.8427929 0.5456633 0.8962554 0.54694332 0.9238364
## 200223270003_R07C01 0.9323844 0.8320504 0.8825100 0.9218012 0.59824543 0.8739442
## 200223270006_R01C01 0.9185355 0.2422218 0.8841690 0.1708472 0.53158639 0.2931961
## 200223270006_R04C01 0.4332642 0.2429551 0.8407797 0.1067122 0.05646838 0.8616530
## cg27341708 cg12466610 cg03327352 cg02320265 cg08779649 cg13885788
## 200223270003_R02C01 0.48846610 0.05767659 0.8851712 0.8853213 0.44449401 0.9380618
## 200223270003_R03C01 0.02613847 0.59131778 0.8786878 0.4686314 0.45076825 0.9369476
## 200223270003_R06C01 0.86893582 0.06939623 0.3042310 0.4838749 0.04810217 0.5163017
## 200223270003_R07C01 0.02642300 0.04527733 0.8273211 0.8986848 0.42715969 0.9183376
## 200223270006_R01C01 0.47573455 0.05212904 0.8774082 0.8987560 0.89313476 0.5525542
## 200223270006_R04C01 0.89411974 0.05104033 0.8829492 0.4768520 0.59523771 0.9328289
## cg25561557 cg01413796 cg26069044 cg03088219 cg12682323 cg17738613
## 200223270003_R02C01 0.76736369 0.1345128 0.92401867 0.844002862 0.9397956 0.6879612
## 200223270003_R03C01 0.03851635 0.2830672 0.94072227 0.007435243 0.9003940 0.6582258
## 200223270003_R06C01 0.47259480 0.8194681 0.93321315 0.120155222 0.9157877 0.1022257
## 200223270003_R07C01 0.43364249 0.9007710 0.56567694 0.826554308 0.9048877 0.8960156
## 200223270006_R01C01 0.46211439 0.2603027 0.94369927 0.066294915 0.1065347 0.8850702
## 200223270006_R04C01 0.44651530 0.9207672 0.02040391 0.574738383 0.8836232 0.8481916
## cg17186592 cg17906851 cg01933473 cg16771215 cg11438323 cg27086157
## 200223270003_R02C01 0.9230463 0.9488392 0.2589014 0.88389723 0.4863471 0.9224112
## 200223270003_R03C01 0.8593448 0.9529718 0.6726133 0.07196933 0.8984559 0.9219304
## 200223270003_R06C01 0.8467599 0.6462151 0.2642560 0.09949974 0.8722772 0.3224986
## 200223270003_R07C01 0.4986373 0.9553497 0.1978068 0.64234023 0.5026756 0.3455486
## 200223270006_R01C01 0.8978999 0.6222117 0.7599441 0.62679274 0.8809646 0.8988962
## 200223270006_R04C01 0.9239750 0.6441202 0.7405661 0.06970175 0.8717937 0.9159217
## cg15535896 cg18821122 cg05841700 cg10738648 cg16579946 cg20370184
## 200223270003_R02C01 0.3382952 0.9291309 0.2923544 0.44931577 0.6306315 0.37710950
## 200223270003_R03C01 0.9253926 0.5901603 0.9146488 0.49894016 0.6648766 0.05737964
## 200223270003_R06C01 0.3320191 0.5779620 0.3737990 0.05552024 0.6455081 0.04740505
## 200223270003_R07C01 0.9409104 0.9251431 0.5046468 0.03730440 0.8979650 0.83572095
## 200223270006_R01C01 0.9326027 0.9217018 0.8419031 0.54952781 0.6886498 0.04056608
## 200223270006_R04C01 0.9156401 0.5412250 0.9286652 0.59358167 0.6766907 0.04038589
## cg12784167 cg15633912 cg02494911 cg21854924 cg25436480 cg12534577
## 200223270003_R02C01 0.81503498 0.1605530 0.3049435 0.8729132 0.84251599 0.8585231
## 200223270003_R03C01 0.02811410 0.9333421 0.2416332 0.7162342 0.49940321 0.8493466
## 200223270003_R06C01 0.03073269 0.8737362 0.2520909 0.7520990 0.34943119 0.8395241
## 200223270003_R07C01 0.84775699 0.9137334 0.2457032 0.8641284 0.85244913 0.8511384
## 200223270006_R01C01 0.83825789 0.9169706 0.8045030 0.6498895 0.44545117 0.8804655
## 200223270006_R04C01 0.45475291 0.8890004 0.7489283 0.5943113 0.02575036 0.3029013
## cg15865722 cg06864789 cg24859648 cg16178271 cg00675157 cg10369879
## 200223270003_R02C01 0.89438595 0.05369415 0.83777536 0.6445416 0.9188438 0.9218784
## 200223270003_R03C01 0.90194372 0.46053125 0.44392797 0.6178075 0.9242325 0.3149306
## 200223270003_R06C01 0.92118977 0.87513655 0.03341185 0.6641952 0.9254708 0.9141081
## 200223270003_R07C01 0.09230759 0.49020327 0.43582347 0.7148058 0.5447244 0.9054415
## 200223270006_R01C01 0.93422668 0.47852685 0.03087161 0.6138954 0.5173554 0.2917862
## 200223270006_R04C01 0.92220002 0.05423587 0.02588024 0.9414188 0.9247232 0.9200403
## cg22274273 cg01128042 cg08198851 cg04412904 cg11227702 cg20913114
## 200223270003_R02C01 0.4209386 0.9113420 0.6578905 0.05088595 0.86486075 0.36510482
## 200223270003_R03C01 0.4246379 0.5328806 0.6578186 0.07717659 0.49184121 0.80382984
## 200223270003_R06C01 0.4196796 0.5222757 0.1272153 0.08253743 0.02543724 0.03158439
## 200223270003_R07C01 0.4164100 0.5141721 0.8351465 0.06217431 0.45150971 0.81256840
## 200223270006_R01C01 0.7951105 0.9321215 0.8791156 0.11888769 0.89086877 0.81502059
## 200223270006_R04C01 0.0229810 0.5050081 0.1423737 0.08885846 0.87675947 0.90468830
## cg02932958 cg00962106 cg15775217 cg21697769 cg16715186 cg00696044
## 200223270003_R02C01 0.7901008 0.9124898 0.5707441 0.8946108 0.2742789 0.55608424
## 200223270003_R03C01 0.4210489 0.5375751 0.9168327 0.2822953 0.7946153 0.07552381
## 200223270003_R06C01 0.3825995 0.5040948 0.6042521 0.8698740 0.8124316 0.79270858
## 200223270003_R07C01 0.7617081 0.9039029 0.9062231 0.9134887 0.7773263 0.03548419
## 200223270006_R01C01 0.8431126 0.8961556 0.9083515 0.2683820 0.8334531 0.10714386
## 200223270006_R04C01 0.7610084 0.8857597 0.6383270 0.2765740 0.8039945 0.18420803
## cg12738248 cg01013522 cg00616572 cg05096415 cg01153376 cg09854620
## 200223270003_R02C01 0.85430866 0.6251168 0.9335067 0.9182527 0.4872148 0.5220587
## 200223270003_R03C01 0.88010292 0.8862821 0.9214079 0.5177819 0.9639670 0.8739646
## 200223270003_R06C01 0.51121855 0.5425308 0.9113633 0.6288426 0.2242410 0.8973149
## 200223270003_R07C01 0.09131476 0.8429862 0.9160238 0.6060271 0.5155654 0.8958863
## 200223270006_R01C01 0.91529345 0.0480531 0.4861334 0.5599588 0.9588916 0.9075331
## 200223270006_R04C01 0.91911405 0.8240222 0.9067928 0.5441200 0.9586876 0.9318820
## cg24861747 cg19512141 cg06378561 cg02356645 cg02372404 cg06950937
## 200223270003_R02C01 0.3540897 0.8209161 0.9389306 0.5105903 0.03598249 0.8910968
## 200223270003_R03C01 0.4309505 0.7903543 0.9377503 0.5833923 0.02767285 0.2889345
## 200223270003_R06C01 0.8071462 0.8404684 0.5154019 0.5701428 0.03127855 0.9143801
## 200223270003_R07C01 0.3347317 0.2202759 0.9403569 0.5683381 0.55685785 0.8891079
## 200223270006_R01C01 0.3544795 0.8059589 0.4956816 0.5233692 0.02587736 0.8868617
## 200223270006_R04C01 0.5997840 0.7020247 0.9268832 0.9188670 0.02828648 0.9093273
## cg00272795 cg03071582 cg08584917 cg23161429 cg07138269 cg13080267
## 200223270003_R02C01 0.46365138 0.9187811 0.5663205 0.8956965 0.5002290 0.78936656
## 200223270003_R03C01 0.82839260 0.5844421 0.9019732 0.9099619 0.9426707 0.78371483
## 200223270003_R06C01 0.07231279 0.6245558 0.9187789 0.8833895 0.5057781 0.09436069
## 200223270003_R07C01 0.78303831 0.9283683 0.6007449 0.9134709 0.9400527 0.09351259
## 200223270006_R01C01 0.78219952 0.5715416 0.9069098 0.8738558 0.9321602 0.45173796
## 200223270006_R04C01 0.44408249 0.6534650 0.9263584 0.9104210 0.9333501 0.49866715
## cg25758034 cg23658987 cg25259265 cg14924512 cg14710850 cg06118351
## 200223270003_R02C01 0.6114028 0.79757644 0.4356646 0.5303907 0.8048592 0.36339400
## 200223270003_R03C01 0.6649219 0.07511718 0.8893591 0.9160885 0.8090950 0.47148604
## 200223270003_R06C01 0.2393844 0.10177571 0.4201700 0.9088414 0.8285902 0.86559618
## 200223270003_R07C01 0.7071501 0.46747992 0.4455517 0.9081681 0.8336457 0.83494303
## 200223270006_R01C01 0.2301078 0.76831297 0.8423337 0.9111789 0.8500725 0.02632111
## 200223270006_R04C01 0.6891513 0.08988532 0.8460736 0.5331753 0.8207247 0.83329300
## cg07480176 cg08857872 cg20678988 cg24873924 cg01921484 cg12776173
## 200223270003_R02C01 0.5171664 0.3395280 0.8438718 0.3060635 0.90985496 0.10388038
## 200223270003_R03C01 0.3760452 0.8181845 0.8548886 0.8640985 0.90931369 0.87306345
## 200223270003_R06C01 0.6998389 0.2970779 0.7786685 0.8259149 0.92044873 0.70094907
## 200223270003_R07C01 0.2189042 0.2954090 0.8260541 0.8333940 0.91674311 0.11367159
## 200223270006_R01C01 0.5570021 0.8935876 0.3295384 0.8761177 0.02943747 0.09458405
## 200223270006_R04C01 0.4501196 0.8901338 0.8541667 0.8585363 0.89057041 0.86532175
## cg00247094 cg03084184 cg01549082 cg26948066 cg07523188 cg26474732
## 200223270003_R02C01 0.5399349 0.8162981 0.2924138 0.4685225 0.7509183 0.7843252
## 200223270003_R03C01 0.9315640 0.7877128 0.7065693 0.5026045 0.1524386 0.8184088
## 200223270003_R06C01 0.5177874 0.4546397 0.2895440 0.9101976 0.7127592 0.7358417
## 200223270003_R07C01 0.5377765 0.7812413 0.6422955 0.9379543 0.8464983 0.7509296
## 200223270006_R01C01 0.9109309 0.7818230 0.8471236 0.9120181 0.7847738 0.8294938
## 200223270006_R04C01 0.5266535 0.7725853 0.6949888 0.8868608 0.8231277 0.8033167
## cg11133939 cg02225060 cg12279734 cg10240127 cg23432430 cg16652920
## 200223270003_R02C01 0.1282694 0.6828159 0.6435368 0.9250553 0.9482702 0.9436000
## 200223270003_R03C01 0.5920898 0.8265195 0.1494651 0.9403255 0.9455418 0.9431222
## 200223270003_R06C01 0.5127706 0.5209552 0.8760759 0.9056974 0.9418716 0.9457161
## 200223270003_R07C01 0.8474176 0.8078889 0.8674214 0.9396217 0.9426559 0.9419785
## 200223270006_R01C01 0.8589133 0.6084903 0.6454450 0.9262370 0.9461736 0.9529417
## 200223270006_R04C01 0.5246557 0.7638781 0.8660058 0.9240497 0.9508404 0.9492648
## cg06112204 cg12228670 cg19503462 cg07028768 cg14240646 cg09584650
## 200223270003_R02C01 0.5251592 0.8632174 0.7951675 0.4496851 0.5391334 0.08230254
## 200223270003_R03C01 0.8773488 0.8496212 0.4537684 0.8536078 0.2538363 0.09661586
## 200223270003_R06C01 0.8867975 0.8738949 0.6997359 0.8356936 0.1864902 0.52399749
## 200223270003_R07C01 0.5613799 0.8362189 0.7189778 0.4245893 0.6402007 0.11587211
## 200223270006_R01C01 0.9184122 0.8079694 0.7301755 0.8835151 0.7696079 0.42115185
## 200223270006_R04C01 0.9152514 0.6966666 0.4207207 0.4514661 0.1490028 0.56043178
## cg27272246 cg16749614 cg04664583 cg26757229 cg03982462 cg06715136
## 200223270003_R02C01 0.8615873 0.8678741 0.5572814 0.6723726 0.8562777 0.3400192
## 200223270003_R03C01 0.8705287 0.8539348 0.5881190 0.1422661 0.6023731 0.9259109
## 200223270003_R06C01 0.8103777 0.5874127 0.9352717 0.7933794 0.8778458 0.9079807
## 200223270003_R07C01 0.0310881 0.5555391 0.9350230 0.8074830 0.8860227 0.6782105
## 200223270006_R01C01 0.7686536 0.8026346 0.9424588 0.5265692 0.8703107 0.8369052
## 200223270006_R04C01 0.4403542 0.7903978 0.9379537 0.7341953 0.8792860 0.8807568
## cg15501526 cg04248279 cg01680303 cg06536614 cg26219488 cg18819889
## 200223270003_R02C01 0.6362531 0.8534976 0.5095174 0.5824474 0.9336638 0.9156157
## 200223270003_R03C01 0.6319253 0.8458854 0.1344941 0.5746694 0.9134707 0.9004455
## 200223270003_R06C01 0.7435100 0.8332786 0.7573869 0.5773468 0.9261878 0.9054439
## 200223270003_R07C01 0.7756577 0.3303204 0.4772204 0.5848917 0.9217866 0.9089935
## 200223270006_R01C01 0.3230777 0.5966878 0.1176263 0.5669919 0.4929692 0.9065397
## 200223270006_R04C01 0.8342695 0.8939599 0.5133033 0.5718514 0.9431574 0.9242767
## cg05570109 cg02981548 cg08861434 cg00689685 cg17429539 cg00322003
## 200223270003_R02C01 0.3466611 0.1342571 0.8768306 0.7019389 0.7860900 0.1759911
## 200223270003_R03C01 0.5866750 0.5220037 0.4352647 0.8634268 0.7100923 0.5702070
## 200223270003_R06C01 0.4046471 0.5098965 0.8698813 0.6378795 0.7660838 0.3077122
## 200223270003_R07C01 0.6014355 0.5660985 0.4709249 0.8624541 0.6984969 0.6104341
## 200223270006_R01C01 0.5774881 0.5678714 0.8618532 0.6361891 0.6508597 0.6147419
## 200223270006_R04C01 0.8756826 0.5079859 0.9058965 0.6356260 0.2828452 0.2293759
## cg11247378 cg07152869 cg00154902 cg14527649 cg27452255 cg03129555
## 200223270003_R02C01 0.1591185 0.8284151 0.5137741 0.2678912 0.9001010 0.6079616
## 200223270003_R03C01 0.7874849 0.5050630 0.8540746 0.7954683 0.6593379 0.5785498
## 200223270003_R06C01 0.4807942 0.8352490 0.8188126 0.8350610 0.9012217 0.9137818
## 200223270003_R07C01 0.4537348 0.5194300 0.4625776 0.8428684 0.8898635 0.9043041
## 200223270006_R01C01 0.1537079 0.5025709 0.4690086 0.8231348 0.5779792 0.9286357
## 200223270006_R04C01 0.1686356 0.8080916 0.4547219 0.8022444 0.8809143 0.9088564
## cg06697310 cg20507276 cg27577781 cg20685672 cg03660162 DX
## 200223270003_R02C01 0.8454609 0.12238910 0.8143535 0.67121006 0.8691767 MCI
## 200223270003_R03C01 0.8653044 0.38721972 0.8113185 0.79320906 0.5160770 CN
## 200223270003_R06C01 0.2405168 0.47978438 0.8144274 0.66136456 0.9026304 CN
## 200223270003_R07C01 0.8479193 0.02261996 0.7970617 0.80838304 0.5305691 Dementia
## 200223270006_R01C01 0.8206613 0.37465798 0.8640044 0.08291414 0.9257451 MCI
## 200223270006_R04C01 0.7839595 0.03570795 0.8840237 0.84460055 0.8935772 CN
print(dim(processed_dataFrame))
## [1] 648 156
print(length(AfterProcess_FeatureName))
## [1] 156
print(head(processed_data))
## # A tibble: 6 × 156
## age.now PC1 PC2 PC3 cg02621446 cg23916408 cg12146221 cg05234269 cg14293999
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 82.4 -0.214 0.0147 -0.0140 0.873 0.194 0.205 0.938 0.284
## 2 78.6 -0.173 0.0575 0.00506 0.810 0.915 0.181 0.575 0.917
## 3 80.4 -0.00367 0.0837 0.0291 0.751 0.889 0.862 0.0247 0.917
## 4 78.2 -0.187 -0.0112 -0.0323 0.877 0.887 0.124 0.565 0.919
## 5 62.9 0.0268 0.0000165 0.0529 0.205 0.222 0.202 0.948 0.197
## 6 80.7 -0.0379 0.0157 -0.00869 0.796 0.152 0.138 0.563 0.903
## # ℹ 147 more variables: cg19377607 <dbl>, cg14307563 <dbl>, cg21209485 <dbl>, cg11331837 <dbl>,
## # cg11187460 <dbl>, cg14564293 <dbl>, cg12012426 <dbl>, cg00999469 <dbl>, cg17421046 <dbl>,
## # cg27639199 <dbl>, cg24851651 <dbl>, cg16788319 <dbl>, cg25879395 <dbl>, cg18339359 <dbl>,
## # cg12284872 <dbl>, cg24506579 <dbl>, cg05321907 <dbl>, cg10985055 <dbl>, cg20139683 <dbl>,
## # cg10750306 <dbl>, cg01667144 <dbl>, cg27341708 <dbl>, cg12466610 <dbl>, cg03327352 <dbl>,
## # cg02320265 <dbl>, cg08779649 <dbl>, cg13885788 <dbl>, cg25561557 <dbl>, cg01413796 <dbl>,
## # cg26069044 <dbl>, cg03088219 <dbl>, cg12682323 <dbl>, cg17738613 <dbl>, cg17186592 <dbl>, …
print(dim(processed_data))
## [1] 648 156
print(AfterProcess_FeatureName)
## [1] "age.now" "PC1" "PC2" "PC3" "cg02621446" "cg23916408" "cg12146221"
## [8] "cg05234269" "cg14293999" "cg19377607" "cg14307563" "cg21209485" "cg11331837" "cg11187460"
## [15] "cg14564293" "cg12012426" "cg00999469" "cg17421046" "cg27639199" "cg24851651" "cg16788319"
## [22] "cg25879395" "cg18339359" "cg12284872" "cg24506579" "cg05321907" "cg10985055" "cg20139683"
## [29] "cg10750306" "cg01667144" "cg27341708" "cg12466610" "cg03327352" "cg02320265" "cg08779649"
## [36] "cg13885788" "cg25561557" "cg01413796" "cg26069044" "cg03088219" "cg12682323" "cg17738613"
## [43] "cg17186592" "cg17906851" "cg01933473" "cg16771215" "cg11438323" "cg27086157" "cg15535896"
## [50] "cg18821122" "cg05841700" "cg10738648" "cg16579946" "cg20370184" "cg12784167" "cg15633912"
## [57] "cg02494911" "cg21854924" "cg25436480" "cg12534577" "cg15865722" "cg06864789" "cg24859648"
## [64] "cg16178271" "cg00675157" "cg10369879" "cg22274273" "cg01128042" "cg08198851" "cg04412904"
## [71] "cg11227702" "cg20913114" "cg02932958" "cg00962106" "cg15775217" "cg21697769" "cg16715186"
## [78] "cg00696044" "cg12738248" "cg01013522" "cg00616572" "cg05096415" "cg01153376" "cg09854620"
## [85] "cg24861747" "cg19512141" "cg06378561" "cg02356645" "cg02372404" "cg06950937" "cg00272795"
## [92] "cg03071582" "cg08584917" "cg23161429" "cg07138269" "cg13080267" "cg25758034" "cg23658987"
## [99] "cg25259265" "cg14924512" "cg14710850" "cg06118351" "cg07480176" "cg08857872" "cg20678988"
## [106] "cg24873924" "cg01921484" "cg12776173" "cg00247094" "cg03084184" "cg01549082" "cg26948066"
## [113] "cg07523188" "cg26474732" "cg11133939" "cg02225060" "cg12279734" "cg10240127" "cg23432430"
## [120] "cg16652920" "cg06112204" "cg12228670" "cg19503462" "cg07028768" "cg14240646" "cg09584650"
## [127] "cg27272246" "cg16749614" "cg04664583" "cg26757229" "cg03982462" "cg06715136" "cg15501526"
## [134] "cg04248279" "cg01680303" "cg06536614" "cg26219488" "cg18819889" "cg05570109" "cg02981548"
## [141] "cg08861434" "cg00689685" "cg17429539" "cg00322003" "cg11247378" "cg07152869" "cg00154902"
## [148] "cg14527649" "cg27452255" "cg03129555" "cg06697310" "cg20507276" "cg27577781" "cg20685672"
## [155] "cg03660162" "DX"
print("Number of Features :")
## [1] "Number of Features :"
Num_feaForProcess = length(AfterProcess_FeatureName)-1 # exclude the "DX" label
print(Num_feaForProcess)
## [1] 155
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123) # for reproducibility
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_modelTrain_LRM1 <- caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 46 7 14
## Dementia 3 10 4
## MCI 17 11 81
##
## Overall Statistics
##
## Accuracy : 0.7098
## 95% CI : (0.6403, 0.7728)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.018e-08
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 0.1607
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6970 0.35714 0.8182
## Specificity 0.8346 0.95758 0.7021
## Pos Pred Value 0.6866 0.58824 0.7431
## Neg Pred Value 0.8413 0.89773 0.7857
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2383 0.05181 0.4197
## Detection Prevalence 0.3472 0.08808 0.5648
## Balanced Accuracy 0.7658 0.65736 0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_modelTrain_LRM1_Accuracy<-cm_modelTrain_LRM1$overall["Accuracy"]
cm_modelTrain_LRM1_Kappa<-cm_modelTrain_LRM1$overall["Kappa"]
print(cm_modelTrain_LRM1_Accuracy)
## Accuracy
## 0.7098446
print(cm_modelTrain_LRM1_Kappa)
## Kappa
## 0.4987013
print(model_LRM1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001810831 0.6350263 0.3962356
## 0.10 0.0018108309 0.6482375 0.4142573
## 0.10 0.0181083090 0.6548792 0.4144240
## 0.55 0.0001810831 0.6263550 0.3765308
## 0.55 0.0018108309 0.6505792 0.4121576
## 0.55 0.0181083090 0.6461597 0.3827291
## 1.00 0.0001810831 0.6087233 0.3485056
## 1.00 0.0018108309 0.6417152 0.3949776
## 1.00 0.0181083090 0.5867925 0.2663062
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.96043956043956"
modelTrain_LRM1_trainAccuracy<-train_accuracy
print(modelTrain_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
modelTrain_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(modelTrain_mean_accuracy_cv_LRM1)
## [1] 0.6331631
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6 ){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.831
## The AUC value for class Dementia is: 0.8309524
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8189
## The AUC value for class MCI is: 0.818934
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_LRM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.83287
print(modelTrain_LRM1_AUC)
## [1] 0.83287
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 90.434 100.000 0.000
## PC2 46.684 78.708 0.000
## PC3 6.233 0.000 68.068
## cg00962106 63.055 11.820 36.946
## cg02225060 23.026 12.636 51.157
## cg14710850 49.611 8.376 25.404
## cg27452255 49.043 17.847 11.831
## cg02981548 26.242 5.624 49.019
## cg08861434 48.632 0.000 42.794
## cg19503462 25.906 48.114 5.790
## cg07152869 27.976 46.742 1.373
## cg16749614 11.544 17.975 45.949
## cg05096415 1.401 44.886 28.916
## cg23432430 44.232 3.492 25.272
## cg17186592 3.086 41.991 26.679
## cg00247094 15.870 41.652 10.423
## cg09584650 41.417 6.520 18.544
## cg11133939 24.223 0.000 40.464
## cg16715186 39.191 7.687 17.054
## cg03129555 12.434 38.574 8.410
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 90.4344829 1.000000e+02 0.00000000 PC1 100.0000000
## 2 46.6839222 7.870805e+01 0.00000000 PC2 78.7080496
## 3 6.2328162 0.000000e+00 68.06814488 PC3 68.0681449
## 4 63.0552396 1.182025e+01 36.94618316 cg00962106 63.0552396
## 5 23.0256423 1.263604e+01 51.15668910 cg02225060 51.1566891
## 6 49.6114597 8.375718e+00 25.40447309 cg14710850 49.6114597
## 7 49.0426664 1.784727e+01 11.83094250 cg27452255 49.0426664
## 8 26.2421391 5.623593e+00 49.01884510 cg02981548 49.0188451
## 9 48.6316741 0.000000e+00 42.79414741 cg08861434 48.6316741
## 10 25.9056334 4.811382e+01 5.79002920 cg19503462 48.1138238
## 11 27.9758274 4.674230e+01 1.37255301 cg07152869 46.7422954
## 12 11.5439720 1.797550e+01 45.94910047 cg16749614 45.9491005
## 13 1.4007011 4.488618e+01 28.91575860 cg05096415 44.8861773
## 14 44.2318100 3.492439e+00 25.27236685 cg23432430 44.2318100
## 15 3.0857895 4.199093e+01 26.67887307 cg17186592 41.9909350
## 16 15.8701456 4.165152e+01 10.42296023 cg00247094 41.6515174
## 17 41.4170473 6.519659e+00 18.54359467 cg09584650 41.4170473
## 18 24.2229738 0.000000e+00 40.46366699 cg11133939 40.4636670
## 19 39.1906814 7.687132e+00 17.05368074 cg16715186 39.1906814
## 20 12.4335216 3.857431e+01 8.41047510 cg03129555 38.5743135
## 21 3.1975305 2.009482e+01 38.48178203 cg08857872 38.4817820
## 22 12.1267569 3.681366e+01 11.11575266 cg06864789 36.8136560
## 23 0.0000000 3.526450e+01 26.75307191 cg14924512 35.2645029
## 24 7.2034302 1.187744e+01 34.91240716 cg16652920 34.9124072
## 25 19.0892781 3.462629e+01 0.00000000 cg03084184 34.6262858
## 26 3.6595748 1.337504e+01 34.17533403 cg26219488 34.1753340
## 27 13.4696898 3.379055e+01 6.05974339 cg20913114 33.7905520
## 28 7.1339389 3.347220e+01 11.82338431 cg06378561 33.4722024
## 29 33.3316253 1.550029e+01 2.09187016 cg26948066 33.3316253
## 30 0.5622498 3.328357e+01 17.47528280 cg25259265 33.2835690
## 31 33.2211736 0.000000e+00 21.60293744 cg06536614 33.2211736
## 32 1.6519657 3.232403e+01 17.24679048 cg24859648 32.3240298
## 33 12.7595657 3.077640e+01 2.20542082 cg12279734 30.7764036
## 34 30.6984661 1.115898e+01 2.50462012 cg03982462 30.6984661
## 35 1.2155507 3.061490e+01 16.61316072 cg05841700 30.6149031
## 36 29.8250204 7.639480e+00 7.72771336 cg11227702 29.8250204
## 37 25.3395122 0.000000e+00 29.02900714 cg12146221 29.0290071
## 38 9.6443024 8.947027e+00 28.92927076 cg02621446 28.9292708
## 39 0.0000000 2.258056e+01 28.83675421 cg00616572 28.8367542
## 40 28.4405639 8.992560e+00 6.53191286 cg15535896 28.4405639
## 41 25.4461478 0.000000e+00 28.22422237 cg02372404 28.2242224
## 42 5.0485282 2.776085e+01 8.13030308 cg09854620 27.7608463
## 43 27.6118761 0.000000e+00 15.84264987 cg04248279 27.6118761
## 44 4.0112641 7.691476e+00 27.53732819 cg20678988 27.5373282
## 45 0.0000000 2.751128e+01 13.85128088 cg24861747 27.5112819
## 46 27.4916653 1.565730e+01 0.00000000 cg10240127 27.4916653
## 47 7.7673556 7.237543e+00 27.22040905 cg16771215 27.2204090
## 48 0.6494625 2.697143e+01 14.64691461 cg01667144 26.9714309
## 49 26.9340775 8.940726e+00 2.80991093 cg13080267 26.9340775
## 50 0.0000000 2.615117e+01 26.58162739 cg02494911 26.5816274
## 51 9.3828836 2.645607e+01 5.12487040 cg10750306 26.4560652
## 52 25.4477953 1.210178e+00 11.25303029 cg11438323 25.4477953
## 53 4.8678332 4.041647e+00 25.42751373 cg06715136 25.4275137
## 54 25.1189917 0.000000e+00 15.38030487 cg04412904 25.1189917
## 55 4.7675523 2.483637e+01 5.38989554 cg12738248 24.8363747
## 56 24.4204434 0.000000e+00 18.67541535 cg03071582 24.4204434
## 57 0.0000000 2.428213e+01 15.80040203 cg05570109 24.2821316
## 58 24.2207010 2.027864e+01 0.00000000 cg15775217 24.2207010
## 59 0.0000000 1.993091e+01 24.20455571 cg24873924 24.2045557
## 60 7.5573086 4.150281e+00 24.12319507 cg17738613 24.1231951
## 61 23.8473094 0.000000e+00 20.77999536 cg01921484 23.8473094
## 62 0.0000000 1.632489e+01 23.69841380 cg10369879 23.6984138
## 63 0.0000000 1.842186e+01 23.64630853 cg27341708 23.6463085
## 64 0.0000000 2.356446e+01 21.42456671 cg12534577 23.5644638
## 65 0.0000000 2.340991e+01 17.84326042 cg18821122 23.4099060
## 66 4.6199090 6.918591e+00 23.35218072 cg12682323 23.3521807
## 67 23.3278818 0.000000e+00 14.17259995 cg05234269 23.3278818
## 68 23.0176353 0.000000e+00 22.81015000 cg20685672 23.0176353
## 69 20.3601152 0.000000e+00 22.85377420 cg12228670 22.8537742
## 70 22.7096273 3.661922e+00 8.33669238 cg11331837 22.7096273
## 71 0.0000000 2.268978e+01 20.85857948 cg01680303 22.6897830
## 72 22.4120115 1.162854e+00 10.22545813 cg17421046 22.4120115
## 73 22.2804698 8.055739e+00 2.25662360 cg03088219 22.2804698
## 74 22.2513002 1.529470e+01 0.00000000 cg02356645 22.2513002
## 75 22.2504181 1.930716e+01 0.00000000 cg00322003 22.2504181
## 76 5.9019825 2.209156e+01 1.27018775 cg01013522 22.0915627
## 77 12.6358759 0.000000e+00 21.79598580 cg00272795 21.7959858
## 78 21.6466651 0.000000e+00 14.52658200 cg25758034 21.6466651
## 79 4.7731408 2.161796e+01 1.17837356 cg26474732 21.6179639
## 80 0.0000000 2.127023e+01 17.62935554 cg16579946 21.2702334
## 81 9.6112523 2.119858e+01 0.00000000 cg07523188 21.1985800
## 82 21.1973751 4.527337e+00 5.64210393 cg11187460 21.1973751
## 83 0.0000000 1.704269e+01 20.81174411 cg14527649 20.8117441
## 84 2.7202966 4.869331e+00 20.54066769 cg20370184 20.5406677
## 85 20.5303029 0.000000e+00 13.70608410 cg17429539 20.5303029
## 86 0.0000000 2.028240e+01 10.01345098 cg20507276 20.2824035
## 87 1.1867078 6.814770e+00 20.19281595 cg13885788 20.1928160
## 88 0.0000000 1.558541e+01 20.05749322 cg16178271 20.0574932
## 89 5.5930964 1.529211e+00 19.98634450 cg10738648 19.9863445
## 90 5.1548765 1.992074e+01 2.74954989 cg26069044 19.9207402
## 91 3.2054503 4.951311e+00 19.79635640 cg25879395 19.7963564
## 92 19.6439491 0.000000e+00 12.11785893 cg06112204 19.6439491
## 93 3.2324983 1.921062e+01 1.24898511 cg23161429 19.2106229
## 94 19.0283759 0.000000e+00 8.87014717 cg25436480 19.0283759
## 95 18.8728946 1.899214e+01 0.00000000 cg26757229 18.9921399
## 96 18.8606945 8.141903e+00 0.00000000 cg02932958 18.8606945
## 97 6.3337445 1.861804e+01 0.95397288 cg18339359 18.6180445
## 98 12.0316035 1.860484e+01 0.00000000 cg23916408 18.6048429
## 99 18.5744389 1.503781e+00 1.89144629 cg06950937 18.5744389
## 100 1.5227772 3.199074e+00 18.17528976 cg12784167 18.1752898
## 101 11.9132835 0.000000e+00 18.10151143 cg07480176 18.1015114
## 102 0.0000000 5.506519e+00 17.68798524 cg15865722 17.6879852
## 103 17.6817848 0.000000e+00 13.03892473 cg27577781 17.6817848
## 104 17.1550957 2.938004e+00 2.52930380 cg05321907 17.1550957
## 105 16.8401590 0.000000e+00 7.60307246 cg03660162 16.8401590
## 106 16.7411474 0.000000e+00 9.90396946 cg07138269 16.7411474
## 107 16.7140083 2.389325e-04 5.47626965 cg20139683 16.7140083
## 108 1.5162248 1.661662e+01 3.60056402 cg12284872 16.6166236
## 109 16.5242600 0.000000e+00 15.33313269 cg03327352 16.5242600
## 110 0.0000000 1.651025e+01 12.91106534 cg23658987 16.5102468
## 111 0.0000000 1.476695e+01 16.16221407 cg21854924 16.1622141
## 112 15.7813302 0.000000e+00 6.84076573 cg21697769 15.7813302
## 113 15.6779586 5.756032e+00 0.00000000 cg19512141 15.6779586
## 114 10.3116466 0.000000e+00 15.47815088 cg08198851 15.4781509
## 115 0.4280786 1.509018e+01 0.82840359 cg00675157 15.0901807
## 116 0.0000000 5.719351e+00 15.01578851 cg01153376 15.0157885
## 117 1.8016318 1.495028e+01 0.76033114 cg01933473 14.9502833
## 118 14.8786805 0.000000e+00 4.61118553 cg12776173 14.8786805
## 119 0.0000000 1.065936e+01 14.72386008 cg14564293 14.7238601
## 120 12.4166349 0.000000e+00 14.55809383 cg24851651 14.5580938
## 121 0.0000000 1.452280e+01 2.27124301 cg22274273 14.5228011
## 122 12.7993364 1.450845e+01 0.00000000 cg25561557 14.5084539
## 123 13.7813968 1.440283e+01 0.00000000 cg21209485 14.4028332
## 124 3.9030612 1.429224e+01 0.00000000 cg10985055 14.2922449
## 125 8.0847376 0.000000e+00 14.23403693 cg14293999 14.2340369
## 126 0.0000000 6.070016e+00 13.99431366 cg18819889 13.9943137
## 127 7.9364691 1.388963e+01 0.00000000 cg24506579 13.8896326
## 128 10.4693081 0.000000e+00 13.82844499 cg19377607 13.8284450
## 129 2.6082790 1.361679e+01 0.00000000 cg06697310 13.6167854
## 130 13.5523883 0.000000e+00 10.18252032 cg00696044 13.5523883
## 131 0.0000000 0.000000e+00 13.07862910 cg01549082 13.0786291
## 132 0.0000000 6.883520e+00 13.07489084 cg01128042 13.0748908
## 133 0.2735344 1.248508e+01 1.15822265 cg00999469 12.4850825
## 134 0.0000000 1.077830e+01 12.38511672 cg06118351 12.3851167
## 135 0.0000000 1.124520e+01 11.81413047 cg12012426 11.8141305
## 136 11.7410716 9.447329e+00 0.00000000 cg08584917 11.7410716
## 137 11.6834885 0.000000e+00 11.18198961 cg27272246 11.6834885
## 138 0.0000000 1.167004e+01 2.24079957 cg15633912 11.6700418
## 139 1.2020056 1.135081e+01 0.00000000 cg16788319 11.3508120
## 140 11.3467673 1.967722e+00 0.00000000 cg17906851 11.3467673
## 141 8.9763770 0.000000e+00 11.28260165 cg07028768 11.2826016
## 142 0.0000000 3.122665e+00 10.73693584 cg27086157 10.7369358
## 143 1.8134330 9.597518e+00 0.00000000 cg14240646 9.5975184
## 144 0.0000000 9.458714e+00 9.18234012 cg00154902 9.4587141
## 145 6.6678268 0.000000e+00 9.09857595 cg14307563 9.0985759
## 146 0.0000000 8.507666e+00 0.00000000 cg02320265 8.5076657
## 147 8.1996351 0.000000e+00 7.04889093 cg08779649 8.1996351
## 148 7.6525336 0.000000e+00 7.97801696 cg04664583 7.9780170
## 149 0.0000000 0.000000e+00 6.63057437 cg12466610 6.6305744
## 150 6.2606420 3.710632e+00 0.00000000 cg27639199 6.2606420
## 151 0.0000000 0.000000e+00 5.83487998 cg15501526 5.8348800
## 152 0.0000000 4.828481e+00 3.67138763 cg00689685 4.8284809
## 153 2.7832004 0.000000e+00 0.08909863 cg01413796 2.7832004
## 154 0.0000000 0.000000e+00 2.11969854 cg11247378 2.1196985
## 155 0.5214480 0.000000e+00 0.63591611 age.now 0.6359161
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 90.434483 100.000000 0.000000 PC1 100.00000
## 2 46.683922 78.708050 0.000000 PC2 78.70805
## 3 6.232816 0.000000 68.068145 PC3 68.06814
## 4 63.055240 11.820255 36.946183 cg00962106 63.05524
## 5 23.025642 12.636043 51.156689 cg02225060 51.15669
## 6 49.611460 8.375718 25.404473 cg14710850 49.61146
## 7 49.042666 17.847272 11.830943 cg27452255 49.04267
## 8 26.242139 5.623593 49.018845 cg02981548 49.01885
## 9 48.631674 0.000000 42.794147 cg08861434 48.63167
## 10 25.905633 48.113824 5.790029 cg19503462 48.11382
## 11 27.975827 46.742295 1.372553 cg07152869 46.74230
## 12 11.543972 17.975498 45.949100 cg16749614 45.94910
## 13 1.400701 44.886177 28.915759 cg05096415 44.88618
## 14 44.231810 3.492439 25.272367 cg23432430 44.23181
## 15 3.085789 41.990935 26.678873 cg17186592 41.99093
## 16 15.870146 41.651517 10.422960 cg00247094 41.65152
## 17 41.417047 6.519659 18.543595 cg09584650 41.41705
## 18 24.222974 0.000000 40.463667 cg11133939 40.46367
## 19 39.190681 7.687132 17.053681 cg16715186 39.19068
## 20 12.433522 38.574313 8.410475 cg03129555 38.57431
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "PC3" "cg00962106" "cg02225060" "cg14710850" "cg27452255"
## [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
table(df_LRM1$DX)
##
## CN Dementia MCI
## 221 94 333
prop.table(table(df_LRM1$DX))
##
## CN Dementia MCI
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
##
## CN Dementia MCI
## 155 66 234
prop.table(table(trainData$DX))
##
## CN Dementia MCI
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 3.542553
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 3.545455Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 132.4, df = 2, p-value < 2.2e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 93.156, df = 2, p-value < 2.2e-16library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN Dementia MCI
## 155 132 234
dim(balanced_data_LGR_1)
## [1] 521 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
cm_modelTrain_LRM2<-caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM2)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 6 15
## Dementia 4 11 6
## MCI 17 11 78
##
## Overall Statistics
##
## Accuracy : 0.6943
## 95% CI : (0.6241, 0.7584)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.356e-07
##
## Kappa : 0.4779
##
## Mcnemar's Test P-Value : 0.5733
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.39286 0.7879
## Specificity 0.8346 0.93939 0.7021
## Pos Pred Value 0.6818 0.52381 0.7358
## Neg Pred Value 0.8346 0.90116 0.7586
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.05699 0.4041
## Detection Prevalence 0.3420 0.10881 0.5492
## Balanced Accuracy 0.7582 0.66613 0.7450
cm_modelTrain_LRM2_Accuracy<-cm_modelTrain_LRM2$overall["Accuracy"]
cm_modelTrain_LRM2_Kappa<-cm_modelTrain_LRM2$overall["Kappa"]
print(cm_modelTrain_LRM2_Accuracy)
## Accuracy
## 0.6943005
print(cm_modelTrain_LRM2_Kappa)
## Kappa
## 0.477924
print(model_LRM2)
## glmnet
##
## 521 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 416, 417, 417, 417, 417
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.000186946 0.7064835 0.5493874
## 0.10 0.001869460 0.7121978 0.5563269
## 0.10 0.018694597 0.7180220 0.5649649
## 0.55 0.000186946 0.7007143 0.5401066
## 0.55 0.001869460 0.7102930 0.5525186
## 0.55 0.018694597 0.6872894 0.5142517
## 1.00 0.000186946 0.6815018 0.5106741
## 1.00 0.001869460 0.7006777 0.5383593
## 1.00 0.018694597 0.6468864 0.4489232
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
modelTrain_LRM2_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", modelTrain_LRM2_trainAccuracy))
## [1] "Training Accuracy: 0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6960073
modelTrain_LRM2_mean_accuracy_model_LRM2 <- mean_accuracy_model_LRM2
print(modelTrain_LRM2_mean_accuracy_model_LRM2)
## [1] 0.6960073
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 80.704 100.000 0.000
## PC2 38.892 80.705 0.000
## cg00962106 56.201 9.090 33.503
## PC3 7.689 0.000 55.751
## cg19503462 26.316 48.654 6.550
## cg27452255 47.897 21.174 8.087
## cg07152869 27.965 46.007 1.318
## cg02225060 18.278 12.778 45.594
## cg05096415 3.335 45.575 28.308
## cg14710850 45.318 8.637 21.709
## cg02981548 23.101 5.918 45.304
## cg08861434 44.834 0.000 36.637
## cg03129555 14.445 41.997 10.550
## cg23432430 41.986 6.868 20.302
## cg16749614 8.925 17.017 41.743
## cg17186592 3.594 40.128 25.162
## cg14924512 1.844 38.968 23.218
## cg09584650 38.239 7.574 15.083
## cg06864789 13.551 38.069 11.887
## cg03084184 19.827 37.852 3.069
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3||METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG ==5 || METHOD_FEATURE_FLAG == 6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 80.703686141 100.00000000 0.000000000 PC1 100.0000000
## 2 38.892106774 80.70482771 0.000000000 PC2 80.7048277
## 3 56.201479700 9.08972707 33.503421018 cg00962106 56.2014797
## 4 7.688739102 0.00000000 55.751183885 PC3 55.7511839
## 5 26.316001409 48.65418043 6.550251667 cg19503462 48.6541804
## 6 47.896588361 21.17415643 8.086550372 cg27452255 47.8965884
## 7 27.965098019 46.00690924 1.318166875 cg07152869 46.0069092
## 8 18.278300515 12.77780684 45.593955040 cg02225060 45.5939550
## 9 3.334662043 45.57501817 28.308064223 cg05096415 45.5750182
## 10 45.317553155 8.63704057 21.709092237 cg14710850 45.3175532
## 11 23.101263030 5.91845809 45.303881158 cg02981548 45.3038812
## 12 44.833547439 0.00000000 36.636739164 cg08861434 44.8335474
## 13 14.444863358 41.99728281 10.550232984 cg03129555 41.9972828
## 14 41.986435467 6.86821373 20.302170331 cg23432430 41.9864355
## 15 8.925154158 17.01732091 41.743083297 cg16749614 41.7430833
## 16 3.594336479 40.12756008 25.162036726 cg17186592 40.1275601
## 17 1.844403961 38.96836649 23.218245781 cg14924512 38.9683665
## 18 38.238819997 7.57425527 15.082827531 cg09584650 38.2388200
## 19 13.550836792 38.06891722 11.886579897 cg06864789 38.0689172
## 20 19.826782299 37.85235354 3.068918378 cg03084184 37.8523535
## 21 21.494484511 0.51248954 37.519314166 cg11133939 37.5193142
## 22 13.590537474 37.17398397 9.106089935 cg00247094 37.1739840
## 23 0.546760644 20.67374130 35.720839473 cg08857872 35.7208395
## 24 35.478778383 7.95324361 14.045617414 cg16715186 35.4787784
## 25 4.944695358 35.05438841 17.439917764 cg24859648 35.0543884
## 26 14.088834696 34.55990196 5.441433884 cg12279734 34.5599020
## 27 1.721991260 34.09480960 18.443604362 cg25259265 34.0948096
## 28 8.420908584 34.06039806 11.644562269 cg06378561 34.0603981
## 29 2.318696798 13.36834282 31.989893449 cg26219488 31.9898934
## 30 12.462516572 31.58502611 5.781491250 cg20913114 31.5850261
## 31 5.481129412 11.23741551 31.361194654 cg16652920 31.3611947
## 32 1.409335240 30.97517179 17.381030655 cg05841700 30.9751718
## 33 29.684237819 14.08949397 0.800327955 cg26948066 29.6842378
## 34 28.741079800 12.28173949 0.038600884 cg03982462 28.7410798
## 35 28.246236512 8.07867524 6.652009298 cg11227702 28.2462365
## 36 6.451683537 28.03614623 8.129228528 cg09854620 28.0361462
## 37 27.457851489 0.00000000 21.578341138 cg06536614 27.4578515
## 38 7.548981775 9.68673919 27.091557976 cg02621446 27.0915580
## 39 0.000000000 26.98140952 24.133953778 cg02494911 26.9814095
## 40 20.434384880 0.00000000 26.640690302 cg12146221 26.6406903
## 41 0.000000000 25.78989084 26.596408896 cg00616572 26.5964089
## 42 9.535314131 26.42283712 5.646486425 cg10750306 26.4228371
## 43 26.171246498 7.89239684 6.028970680 cg15535896 26.1712465
## 44 1.141241069 25.92805819 13.642098297 cg01667144 25.9280582
## 45 0.000000000 25.62258868 13.483020973 cg24861747 25.6225887
## 46 25.562890663 15.08442524 0.000000000 cg10240127 25.5628907
## 47 24.107334555 0.00000000 25.129944649 cg02372404 25.1299446
## 48 1.099920364 8.20445533 25.057279275 cg06715136 25.0572793
## 49 24.802706624 0.00000000 16.164643453 cg20685672 24.8027066
## 50 0.000000000 24.74797600 14.637492919 cg05570109 24.7479760
## 51 24.747252306 0.00000000 13.430078586 cg04248279 24.7472523
## 52 4.039949965 5.50360620 24.334481230 cg20678988 24.3344812
## 53 0.000000000 24.20182274 18.412414046 cg12534577 24.2018227
## 54 0.000000000 24.14528640 15.852423949 cg16579946 24.1452864
## 55 4.824103433 24.11211052 5.706278058 cg12738248 24.1121105
## 56 6.529552881 5.93633077 24.070923901 cg16771215 24.0709239
## 57 23.998697732 10.16375705 0.028444076 cg13080267 23.9986977
## 58 5.508573597 5.66470926 23.060145881 cg17738613 23.0601459
## 59 22.320691887 6.53204639 5.663980561 cg11331837 22.3206919
## 60 0.000000000 22.29255346 17.220491420 cg01680303 22.2925535
## 61 22.203937989 0.00000000 13.212696497 cg04412904 22.2039380
## 62 0.000000000 22.06075522 14.962802230 cg18821122 22.0607552
## 63 3.426205425 7.31616766 22.054014997 cg12682323 22.0540150
## 64 22.044090381 16.25405024 0.000000000 cg02356645 22.0440904
## 65 0.000000000 20.81120723 22.035886216 cg24873924 22.0358862
## 66 0.000000000 15.83806198 22.017151253 cg10369879 22.0171513
## 67 6.483591345 21.73850252 0.949117939 cg01013522 21.7385025
## 68 16.484539947 0.00000000 21.585055537 cg12228670 21.5850555
## 69 7.512558194 21.11431943 0.000000000 cg07523188 21.1143194
## 70 21.107468333 18.07754102 0.000000000 cg15775217 21.1074683
## 71 21.004224876 0.00000000 16.895333275 cg03071582 21.0042249
## 72 20.964839525 0.00000000 12.103259685 cg05234269 20.9648395
## 73 0.000000000 20.92290026 7.904303978 cg20507276 20.9229003
## 74 0.000000000 19.13703698 20.818072609 cg27341708 20.8180726
## 75 20.448742995 8.88181889 0.345152147 cg03088219 20.4487430
## 76 13.178540035 20.43950122 0.000000000 cg25561557 20.4395012
## 77 20.418843523 0.00000000 19.515999145 cg01921484 20.4188435
## 78 4.716200159 20.18026145 4.191000972 cg26069044 20.1802615
## 79 20.114237367 0.00000000 7.566075546 cg06112204 20.1142374
## 80 20.092712908 0.00000000 10.279542242 cg25758034 20.0927129
## 81 20.070487519 0.22767831 9.404986617 cg17421046 20.0704875
## 82 19.734694924 0.00000000 9.880887107 cg17429539 19.7346949
## 83 19.722311790 0.00000000 12.781522977 cg11438323 19.7223118
## 84 19.506576639 14.87078135 0.000000000 cg00322003 19.5065766
## 85 19.312640558 4.15131066 4.739257476 cg11187460 19.3126406
## 86 2.519503668 5.41476250 18.974767697 cg25879395 18.9747677
## 87 4.053920988 18.83733509 0.222787042 cg26474732 18.8373351
## 88 2.902849319 18.77753865 2.410773841 cg23161429 18.7775387
## 89 1.678228292 4.79451720 18.695449561 cg20370184 18.6954496
## 90 18.635058437 0.02030071 6.333516204 cg25436480 18.6350584
## 91 0.009367363 7.65726758 18.620269493 cg13885788 18.6202695
## 92 11.432827083 18.29052960 0.000000000 cg23916408 18.2905296
## 93 0.000000000 16.67394266 18.169205978 cg14527649 18.1692060
## 94 5.006391021 1.01514715 18.052832535 cg10738648 18.0528325
## 95 0.000000000 17.95587603 12.792821501 cg23658987 17.9558760
## 96 5.979693923 17.93087828 1.282292283 cg18339359 17.9308783
## 97 10.251947671 0.00000000 17.825772284 cg07480176 17.8257723
## 98 16.786642859 17.80306807 0.000000000 cg26757229 17.8030681
## 99 2.978375115 17.78146141 4.058113259 cg12284872 17.7814614
## 100 17.462538616 8.50325820 0.000000000 cg02932958 17.4625386
## 101 8.075452039 17.45484195 0.000000000 cg24506579 17.4548420
## 102 13.340238124 0.00000000 17.328520368 cg00272795 17.3285204
## 103 0.000000000 7.46135917 17.205305045 cg12784167 17.2053050
## 104 16.749558824 0.00000000 6.659641924 cg03660162 16.7495588
## 105 0.000000000 16.02418230 16.431224038 cg16178271 16.4312240
## 106 16.369728471 0.00000000 11.970813245 cg27577781 16.3697285
## 107 16.142509014 0.00000000 8.252708167 cg07138269 16.1425090
## 108 15.966964085 2.87389006 2.064971565 cg05321907 15.9669641
## 109 0.752128487 15.69090694 2.151492812 cg22274273 15.6909069
## 110 0.464584499 3.15826720 15.547431322 cg15865722 15.5474313
## 111 13.410477411 15.53876646 0.000000000 cg21209485 15.5387665
## 112 15.460415810 0.62945144 3.699987481 cg20139683 15.4604158
## 113 0.804687613 15.26727627 2.246868395 cg15633912 15.2672763
## 114 1.785059975 15.21041165 0.497890838 cg00675157 15.2104117
## 115 0.000000000 15.03562942 13.712510585 cg21854924 15.0356294
## 116 0.000000000 8.28447756 14.989389757 cg14564293 14.9893898
## 117 1.414757231 14.66617650 1.617046584 cg01933473 14.6661765
## 118 14.335634479 0.00000000 2.357215925 cg06950937 14.3356345
## 119 7.029045619 0.00000000 14.262088984 cg14293999 14.2620890
## 120 0.000000000 7.60256780 14.106906124 cg01128042 14.1069061
## 121 13.942362711 0.00000000 2.049705730 cg12776173 13.9423627
## 122 13.939989220 0.00000000 13.916599358 cg03327352 13.9399892
## 123 8.354050412 0.00000000 13.901700226 cg24851651 13.9017002
## 124 8.499466117 0.00000000 13.725880639 cg19377607 13.7258806
## 125 13.691076440 0.00000000 7.338783387 cg00696044 13.6910764
## 126 0.000000000 2.81944706 13.617942761 cg01153376 13.6179428
## 127 13.585923001 3.87624147 0.000000000 cg19512141 13.5859230
## 128 0.000000000 6.29261372 13.547936670 cg18819889 13.5479367
## 129 8.866699352 0.00000000 13.130459074 cg27272246 13.1304591
## 130 12.210246938 0.00000000 12.998907898 cg08198851 12.9989079
## 131 0.000000000 9.82359181 12.661949537 cg06118351 12.6619495
## 132 4.079204165 12.38089824 0.000000000 cg10985055 12.3808982
## 133 0.930325099 11.77391661 0.006453595 cg16788319 11.7739166
## 134 1.061914720 11.72164162 0.000000000 cg14240646 11.7216416
## 135 0.794060278 11.56506514 0.391257306 cg00999469 11.5650651
## 136 0.000000000 11.34570309 10.958707137 cg12012426 11.3457031
## 137 0.000000000 2.70078848 10.861921805 cg01549082 10.8619218
## 138 10.752888079 0.00000000 9.149308817 cg21697769 10.7528881
## 139 10.661004204 0.00000000 7.583647513 cg07028768 10.6610042
## 140 10.322517299 3.96305045 0.000000000 cg17906851 10.3225173
## 141 0.000000000 8.38569469 9.801351175 cg27086157 9.8013512
## 142 0.300234216 9.75810599 0.000000000 cg06697310 9.7581060
## 143 9.748681749 9.22022799 0.000000000 cg08584917 9.7486817
## 144 2.496354466 0.00000000 9.517519131 cg04664583 9.5175191
## 145 0.596325741 9.50789659 0.000000000 cg02320265 9.5078966
## 146 4.880785773 0.00000000 8.716856857 cg14307563 8.7168569
## 147 6.221593172 0.00000000 8.462523133 cg08779649 8.4625231
## 148 0.000000000 6.07441901 7.328898485 cg00154902 7.3288985
## 149 0.000000000 0.00000000 6.410132361 cg12466610 6.4101324
## 150 6.361107171 4.10109446 0.000000000 cg27639199 6.3611072
## 151 0.000000000 5.85370045 4.811164869 cg00689685 5.8537004
## 152 0.000000000 2.99739856 5.183362714 cg15501526 5.1833627
## 153 2.829967253 0.00000000 0.000000000 cg01413796 2.8299673
## 154 0.421123998 0.00000000 0.566168128 age.now 0.5661681
## 155 0.000000000 0.42855835 0.032750493 cg11247378 0.4285583
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 80.703686 100.000000 0.000000 PC1 100.00000
## 2 38.892107 80.704828 0.000000 PC2 80.70483
## 3 56.201480 9.089727 33.503421 cg00962106 56.20148
## 4 7.688739 0.000000 55.751184 PC3 55.75118
## 5 26.316001 48.654180 6.550252 cg19503462 48.65418
## 6 47.896588 21.174156 8.086550 cg27452255 47.89659
## 7 27.965098 46.006909 1.318167 cg07152869 46.00691
## 8 18.278301 12.777807 45.593955 cg02225060 45.59396
## 9 3.334662 45.575018 28.308064 cg05096415 45.57502
## 10 45.317553 8.637041 21.709092 cg14710850 45.31755
## 11 23.101263 5.918458 45.303881 cg02981548 45.30388
## 12 44.833547 0.000000 36.636739 cg08861434 44.83355
## 13 14.444863 41.997283 10.550233 cg03129555 41.99728
## 14 41.986435 6.868214 20.302170 cg23432430 41.98644
## 15 8.925154 17.017321 41.743083 cg16749614 41.74308
## 16 3.594336 40.127560 25.162037 cg17186592 40.12756
## 17 1.844404 38.968366 23.218246 cg14924512 38.96837
## 18 38.238820 7.574255 15.082828 cg09584650 38.23882
## 19 13.550837 38.068917 11.886580 cg06864789 38.06892
## 20 19.826782 37.852354 3.068918 cg03084184 37.85235
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "PC3" "cg19503462" "cg27452255" "cg07152869"
## [8] "cg02225060" "cg05096415" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_LRM2_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
print(modelTrain_LRM2_AUC)
## [1] 0.835018
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.6571736 0.42345797
## 0 0.05357895 0.6725349 0.43439423
## 0 0.10615789 0.6747338 0.43094148
## 0 0.15873684 0.6725599 0.42391171
## 0 0.21131579 0.6725837 0.41818370
## 0 0.26389474 0.6770526 0.42406079
## 0 0.31647368 0.6769804 0.41856449
## 0 0.36905263 0.6726087 0.40853473
## 0 0.42163158 0.6638170 0.38542265
## 0 0.47421053 0.6660148 0.38902178
## 0 0.52678947 0.6594214 0.37628816
## 0 0.57936842 0.6550252 0.36510400
## 0 0.63194737 0.6528274 0.35927177
## 0 0.68452632 0.6418618 0.33471759
## 0 0.73710526 0.6352200 0.31832804
## 0 0.78968421 0.6307756 0.30720022
## 0 0.84226316 0.6263800 0.29777058
## 0 0.89484211 0.6220322 0.28739881
## 0 0.94742105 0.6220322 0.28739881
## 0 1.00000000 0.6220322 0.28682520
## 1 0.00100000 0.6240596 0.37352512
## 1 0.05357895 0.5187546 0.05457313
## 1 0.10615789 0.5142862 0.00000000
## 1 0.15873684 0.5142862 0.00000000
## 1 0.21131579 0.5142862 0.00000000
## 1 0.26389474 0.5142862 0.00000000
## 1 0.31647368 0.5142862 0.00000000
## 1 0.36905263 0.5142862 0.00000000
## 1 0.42163158 0.5142862 0.00000000
## 1 0.47421053 0.5142862 0.00000000
## 1 0.52678947 0.5142862 0.00000000
## 1 0.57936842 0.5142862 0.00000000
## 1 0.63194737 0.5142862 0.00000000
## 1 0.68452632 0.5142862 0.00000000
## 1 0.73710526 0.5142862 0.00000000
## 1 0.78968421 0.5142862 0.00000000
## 1 0.84226316 0.5142862 0.00000000
## 1 0.89484211 0.5142862 0.00000000
## 1 0.94742105 0.5142862 0.00000000
## 1 1.00000000 0.5142862 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.5868408
modelTrain_mean_accuracy_cv_ENM1 <- mean_accuracy_elastic_net_model1
print(modelTrain_mean_accuracy_cv_ENM1)
## [1] 0.5868408
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
modelTrain_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.863736263736264"
print(modelTrain_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_modelTrain_ENM1<- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_modelTrain_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 5 13
## Dementia 0 8 0
## MCI 21 15 86
##
## Overall Statistics
##
## Accuracy : 0.7202
## 95% CI : (0.6512, 0.7823)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 3.473e-09
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 6.901e-05
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.28571 0.8687
## Specificity 0.8583 1.00000 0.6170
## Pos Pred Value 0.7143 1.00000 0.7049
## Neg Pred Value 0.8385 0.89189 0.8169
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.04145 0.4456
## Detection Prevalence 0.3264 0.04145 0.6321
## Balanced Accuracy 0.7700 0.64286 0.7429
cm_modelTrain_ENM1_Accuracy <- cm_modelTrain_ENM1$overall["Accuracy"]
print(cm_modelTrain_ENM1_Accuracy)
## Accuracy
## 0.7202073
cm_modelTrain_ENM1_Kappa <- cm_modelTrain_ENM1$overall["Kappa"]
print(cm_modelTrain_ENM1_Kappa)
## Kappa
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 86.62 100.000 13.321
## PC2 68.42 88.612 20.132
## cg00962106 72.97 12.366 60.547
## cg02225060 43.14 18.834 62.035
## cg02981548 49.98 8.979 59.014
## cg23432430 57.30 15.766 41.471
## cg14710850 54.51 8.371 46.083
## cg16749614 20.69 33.680 54.425
## cg07152869 48.28 54.282 5.938
## cg08857872 29.00 24.416 53.480
## cg16652920 27.04 25.381 52.485
## cg26948066 51.17 42.097 9.011
## PC3 12.10 38.684 50.845
## cg08861434 48.61 1.041 49.709
## cg27452255 49.50 29.755 19.689
## cg09584650 48.12 20.551 27.505
## cg11133939 31.92 15.800 47.784
## cg19503462 47.24 44.923 2.257
## cg06864789 20.57 46.480 25.853
## cg02372404 30.75 14.690 45.496
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 ||METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 86.61980738 100.0000000 13.3210618 PC1 100.0000000
## 2 68.42071110 88.6123141 20.1324722 PC2 88.6123141
## 3 72.97240265 12.3659467 60.5473251 cg00962106 72.9724027
## 4 43.14232480 18.8338927 62.0353483 cg02225060 62.0353483
## 5 49.97512219 8.9794598 59.0137128 cg02981548 59.0137128
## 6 57.29678176 15.7661987 41.4714522 cg23432430 57.2967818
## 7 54.51398398 8.3713581 46.0834950 cg14710850 54.5139840
## 8 20.68641851 33.6797465 54.4252959 cg16749614 54.4252959
## 9 48.28492677 54.2816940 5.9376364 cg07152869 54.2816940
## 10 29.00490368 24.4161325 53.4801671 cg08857872 53.4801671
## 11 27.04478573 25.3809708 52.4848873 cg16652920 52.4848873
## 12 51.16762313 42.0970116 9.0114807 cg26948066 51.1676231
## 13 12.10171481 38.6842811 50.8451268 PC3 50.8451268
## 14 48.60846917 1.0414802 49.7090802 cg08861434 49.7090802
## 15 49.50310946 29.7550893 19.6888893 cg27452255 49.5031095
## 16 48.11512671 20.5506815 27.5053143 cg09584650 48.1151267
## 17 31.92434694 15.8003166 47.7837944 cg11133939 47.7837944
## 18 47.23931827 44.9227936 2.2573938 cg19503462 47.2393183
## 19 20.56740794 46.4799364 25.8533976 cg06864789 46.4799364
## 20 30.74658536 14.6900138 45.4957300 cg02372404 45.4957300
## 21 13.69843362 45.3182174 31.5606530 cg24859648 45.3182174
## 22 10.38803025 34.7255433 45.1727044 cg14527649 45.1727044
## 23 44.71363816 32.6644840 11.9900233 cg03982462 44.7136382
## 24 43.78509791 14.9929464 28.7330206 cg06536614 43.7850979
## 25 0.06067044 43.3030784 43.1832771 cg17186592 43.3030784
## 26 26.35599836 16.7605895 43.1757187 cg26219488 43.1757187
## 27 42.96866193 14.0889255 28.8206056 cg10240127 42.9686619
## 28 13.43834291 42.8997598 29.4022860 cg00247094 42.8997598
## 29 35.47709793 6.8655424 42.4017712 cg20685672 42.4017712
## 30 3.60009203 42.1583119 38.4990890 cg25259265 42.1583119
## 31 42.14830982 14.2620551 27.8271239 cg16715186 42.1483098
## 32 0.72192332 41.9398729 41.1588187 cg05096415 41.9398729
## 33 34.83789455 41.7675213 6.8704959 cg15775217 41.7675213
## 34 15.97002286 40.5910153 24.5618616 cg24861747 40.5910153
## 35 34.02805902 6.2237390 40.3109289 cg07028768 40.3109289
## 36 4.43445619 39.7357422 35.2421551 cg14924512 39.7357422
## 37 24.98145608 39.6420632 14.6014763 cg03084184 39.6420632
## 38 4.47207238 39.0722526 34.5410494 cg05570109 39.0722526
## 39 34.88239186 4.0051024 38.9466251 cg01921484 38.9466251
## 40 9.76731403 27.7961405 37.6225854 cg00154902 37.6225854
## 41 28.32807068 37.4435859 9.0563843 cg26757229 37.4435859
## 42 37.36311741 9.8522199 27.4517666 cg03660162 37.3631174
## 43 35.88740462 0.5246911 36.4712266 cg12228670 36.4712266
## 44 4.42463153 31.7466394 36.2304018 cg00616572 36.2304018
## 45 14.12327162 36.1674624 21.9850599 cg20507276 36.1674624
## 46 5.46685523 35.4547743 29.9287882 cg05841700 35.4547743
## 47 21.87332560 13.5177259 35.4501824 cg06715136 35.4501824
## 48 22.83960358 12.2764600 35.1751945 cg02621446 35.1751945
## 49 18.36553359 35.0243219 16.5996575 cg12738248 35.0243219
## 50 14.23671333 34.9439501 20.6481059 cg09854620 34.9439501
## 51 32.22002224 34.8183721 2.5392190 cg00322003 34.8183721
## 52 8.08934808 26.6092839 34.7577628 cg24873924 34.7577628
## 53 14.18500560 34.7017215 20.4575850 cg03129555 34.7017215
## 54 34.68040616 7.5913417 27.0299336 cg04412904 34.6804062
## 55 15.01714515 19.5748660 34.6511420 cg17738613 34.6511420
## 56 18.92764654 15.5954372 34.5822146 cg25879395 34.5822146
## 57 34.34592880 10.8922720 23.3945260 cg05234269 34.3459288
## 58 22.75324748 34.0758121 11.2634338 cg20913114 34.0758121
## 59 1.11024972 32.5737876 33.7431682 cg02494911 33.7431682
## 60 17.47533175 33.5173057 15.9828431 cg00675157 33.5173057
## 61 26.90964711 33.4667383 6.4979603 cg12279734 33.4667383
## 62 12.81244098 20.5534266 33.4249985 cg01153376 33.4249985
## 63 30.30228696 2.9712219 33.3326397 cg04248279 33.3326397
## 64 30.64177910 33.2101071 2.5091972 cg06697310 33.2101071
## 65 25.58169803 32.8947626 7.2539337 cg26474732 32.8947626
## 66 19.20507436 13.6298519 32.8940571 cg16771215 32.8940571
## 67 1.21872114 32.7015628 31.4237108 cg12534577 32.7015628
## 68 14.55299268 32.4375273 17.8254038 cg06378561 32.4375273
## 69 19.19337215 13.1667746 32.4192776 cg18819889 32.4192776
## 70 29.78124856 32.2253710 2.3849916 cg01013522 32.2253710
## 71 8.94008603 23.2172015 32.2164184 cg10369879 32.2164184
## 72 31.34262662 9.3197110 21.9637847 cg03327352 31.3426266
## 73 31.30355116 8.6991762 22.5452441 cg07138269 31.3035512
## 74 30.28320694 0.7213164 31.0636542 cg12146221 31.0636542
## 75 31.02005091 11.5458527 19.4150674 cg11227702 31.0200509
## 76 30.51451274 0.2100065 30.7836501 cg27577781 30.7836501
## 77 30.74231821 29.3053393 1.3778481 cg02356645 30.7423182
## 78 10.89284695 19.6097264 30.5617042 cg15865722 30.5617042
## 79 21.13422867 30.5356551 9.3422956 cg18339359 30.5356551
## 80 21.72890824 30.5064745 8.7184354 cg08584917 30.5064745
## 81 30.48807238 16.2409255 14.1880161 cg15535896 30.4880724
## 82 9.35012792 30.3539424 20.9446836 cg01680303 30.3539424
## 83 0.66494142 29.5735803 30.2976526 cg01667144 30.2976526
## 84 17.55718658 29.9353599 12.3190425 cg07523188 29.9353599
## 85 12.72225388 17.0912940 29.8726788 cg21854924 29.8726788
## 86 9.99476417 29.7475418 19.6936468 cg10750306 29.7475418
## 87 5.72549778 29.6192962 23.8346676 cg16579946 29.6192962
## 88 29.45584605 5.8732413 23.5234739 cg11438323 29.4558461
## 89 7.90591688 29.3699310 21.4048833 cg18821122 29.3699310
## 90 13.47506890 15.5239825 29.0581823 cg01128042 29.0581823
## 91 12.44251146 16.5156100 29.0172524 cg14564293 29.0172524
## 92 28.70364944 0.4438248 28.2006938 cg08198851 28.7036494
## 93 25.91934227 2.7083398 28.6868129 cg00696044 28.6868129
## 94 28.65073400 7.4912723 21.1003308 cg17421046 28.6507340
## 95 28.22916163 14.2410737 13.9289571 cg11331837 28.2291616
## 96 4.58143848 23.1881761 27.8287454 cg12682323 27.8287454
## 97 27.76407045 23.1524875 4.5524521 cg02932958 27.7640704
## 98 2.23438876 27.7093483 25.4158287 cg23658987 27.7093483
## 99 13.54531406 14.0663595 27.6708044 cg07480176 27.6708044
## 100 18.99608527 8.5697728 27.6249890 cg10738648 27.6249890
## 101 23.24342302 4.2307549 27.5333088 cg03071582 27.5333088
## 102 27.51218319 13.7211465 13.7319058 cg25758034 27.5121832
## 103 8.31892344 18.5119214 26.8899757 cg06118351 26.8899757
## 104 26.47568285 26.6877656 0.1529519 cg19512141 26.6877656
## 105 15.77820329 26.6266949 10.7893607 cg23161429 26.6266949
## 106 13.98395631 26.3981501 12.3550629 cg11247378 26.3981501
## 107 18.59425527 7.6889075 26.3422936 cg20678988 26.3422936
## 108 14.37330607 11.5502174 25.9826543 cg27086157 25.9826543
## 109 25.84846449 9.7819707 16.0073629 cg03088219 25.8484645
## 110 13.63204082 25.2790065 11.5878348 cg22274273 25.2790065
## 111 2.73202846 22.3681532 25.1593125 cg13885788 25.1593125
## 112 7.97490935 16.6875985 24.7216387 cg14240646 24.7216387
## 113 23.64920445 0.7936390 24.5019743 cg06112204 24.5019743
## 114 24.37942064 4.9143530 19.4059368 cg17429539 24.3794206
## 115 23.06031956 24.3605205 1.2410701 cg25561557 24.3605205
## 116 21.12251075 3.1401716 24.3218132 cg14293999 24.3218132
## 117 15.52461212 8.6507741 24.2345170 cg19377607 24.2345170
## 118 21.14489724 24.1161937 2.9121656 cg06950937 24.1161937
## 119 24.10030759 4.0940187 19.9471581 cg25436480 24.1003076
## 120 14.61554620 9.0258936 23.7005707 cg00272795 23.7005707
## 121 10.00948192 13.3941500 23.4627628 cg12012426 23.4627628
## 122 23.38852933 17.1911787 6.1382198 cg05321907 23.3885293
## 123 23.16383334 9.9827959 13.1219066 cg20139683 23.1638333
## 124 0.72466966 23.1298320 22.3460315 cg26069044 23.1298320
## 125 21.03326043 22.4244053 1.3320140 cg23916408 22.4244053
## 126 0.60816447 22.2322811 21.5649857 cg27341708 22.2322811
## 127 15.97168251 22.2117286 6.1809152 cg13080267 22.2117286
## 128 21.86773439 1.3035382 20.5050654 cg27272246 21.8677344
## 129 0.95871508 21.8471387 20.8292928 cg12284872 21.8471387
## 130 2.41389221 21.7049413 19.2319182 cg00689685 21.7049413
## 131 2.01953773 21.5333800 19.4547114 cg16178271 21.5333800
## 132 21.28126202 8.1255260 13.0966052 cg21209485 21.2812620
## 133 20.59895207 10.6008980 9.9389232 cg24851651 20.5989521
## 134 20.34289806 7.3326234 12.9511438 cg21697769 20.3428981
## 135 20.33374499 6.2181332 14.0564810 cg04664583 20.3337450
## 136 14.64603304 19.9415172 5.2363533 cg00999469 19.9415172
## 137 2.27365806 17.4302757 19.7630646 cg20370184 19.7630646
## 138 18.98361847 4.1866558 14.7378318 cg11187460 18.9836185
## 139 18.44110528 2.0022682 16.3797062 cg12784167 18.4411053
## 140 1.20240217 16.9911440 18.2526771 cg02320265 18.2526771
## 141 17.49711486 13.5814940 3.8564900 cg12776173 17.4971149
## 142 17.28620363 1.2806951 15.9463776 cg08779649 17.2862036
## 143 8.18664789 8.9921517 17.2379305 cg01933473 17.2379305
## 144 17.18897418 8.9556535 8.1741898 cg15501526 17.1889742
## 145 13.77899505 16.9406406 3.1025147 cg10985055 16.9406406
## 146 16.16970264 6.7553447 9.3552271 cg17906851 16.1697026
## 147 11.30016436 4.7162247 16.0755199 cg14307563 16.0755199
## 148 4.33754653 14.3186706 9.9219932 cg16788319 14.3186706
## 149 11.35637215 13.8497088 2.4342058 cg24506579 13.8497088
## 150 9.52822170 12.4287167 2.8413641 cg27639199 12.4287167
## 151 1.91402524 10.3049754 12.2781315 cg12466610 12.2781315
## 152 9.01275784 2.1922847 11.2641734 cg15633912 11.2641734
## 153 0.00000000 11.1759032 11.2350341 cg01413796 11.2350341
## 154 1.46197753 0.1924735 1.7135819 cg01549082 1.7135819
## 155 0.71164295 0.0102105 0.7809843 age.now 0.7809843
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 86.61981 100.000000 13.321062 PC1 100.00000
## 2 68.42071 88.612314 20.132472 PC2 88.61231
## 3 72.97240 12.365947 60.547325 cg00962106 72.97240
## 4 43.14232 18.833893 62.035348 cg02225060 62.03535
## 5 49.97512 8.979460 59.013713 cg02981548 59.01371
## 6 57.29678 15.766199 41.471452 cg23432430 57.29678
## 7 54.51398 8.371358 46.083495 cg14710850 54.51398
## 8 20.68642 33.679747 54.425296 cg16749614 54.42530
## 9 48.28493 54.281694 5.937636 cg07152869 54.28169
## 10 29.00490 24.416133 53.480167 cg08857872 53.48017
## 11 27.04479 25.380971 52.484887 cg16652920 52.48489
## 12 51.16762 42.097012 9.011481 cg26948066 51.16762
## 13 12.10171 38.684281 50.845127 PC3 50.84513
## 14 48.60847 1.041480 49.709080 cg08861434 49.70908
## 15 49.50311 29.755089 19.688889 cg27452255 49.50311
## 16 48.11513 20.550682 27.505314 cg09584650 48.11513
## 17 31.92435 15.800317 47.783794 cg11133939 47.78379
## 18 47.23932 44.922794 2.257394 cg19503462 47.23932
## 19 20.56741 46.479936 25.853398 cg06864789 46.47994
## 20 30.74659 14.690014 45.495730 cg02372404 45.49573
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
## [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3" "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG ==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_ENM1_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272
print(modelTrain_ENM1_AUC)
## [1] 0.8566272
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
# Start point of parallel processing
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.5979264 0.2557362
## 0.3 1 0.6 0.50 100 0.5759956 0.2294683
## 0.3 1 0.6 0.50 150 0.5868647 0.2544769
## 0.3 1 0.6 0.75 50 0.5495987 0.1698425
## 0.3 1 0.6 0.75 100 0.5626878 0.1993241
## 0.3 1 0.6 0.75 150 0.5824202 0.2463429
## 0.3 1 0.6 1.00 50 0.5408547 0.1416942
## 0.3 1 0.6 1.00 100 0.5342358 0.1484288
## 0.3 1 0.6 1.00 150 0.5584361 0.1997970
## 0.3 1 0.8 0.50 50 0.5737490 0.2177328
## 0.3 1 0.8 0.50 100 0.5868886 0.2641710
## 0.3 1 0.8 0.50 150 0.5913330 0.2705229
## 0.3 1 0.8 0.75 50 0.5583171 0.1837199
## 0.3 1 0.8 0.75 100 0.5691835 0.2208545
## 0.3 1 0.8 0.75 150 0.5715751 0.2358985
## 0.3 1 0.8 1.00 50 0.5297924 0.1194723
## 0.3 1 0.8 1.00 100 0.5387036 0.1529392
## 0.3 1 0.8 1.00 150 0.5518904 0.1874991
## 0.3 2 0.6 0.50 50 0.5715268 0.2229975
## 0.3 2 0.6 0.50 100 0.5647890 0.2149601
## 0.3 2 0.6 0.50 150 0.5845931 0.2580084
## 0.3 2 0.6 0.75 50 0.5560939 0.1777674
## 0.3 2 0.6 0.75 100 0.5649812 0.2095009
## 0.3 2 0.6 0.75 150 0.5804146 0.2413210
## 0.3 2 0.6 1.00 50 0.5627839 0.1889220
## 0.3 2 0.6 1.00 100 0.5737007 0.2143958
## 0.3 2 0.6 1.00 150 0.5825891 0.2369697
## 0.3 2 0.8 0.50 50 0.5650783 0.2118362
## 0.3 2 0.8 0.50 100 0.5782428 0.2405047
## 0.3 2 0.8 0.50 150 0.5826140 0.2473843
## 0.3 2 0.8 0.75 50 0.5736529 0.2121701
## 0.3 2 0.8 0.75 100 0.5758507 0.2209327
## 0.3 2 0.8 0.75 150 0.5913558 0.2527139
## 0.3 2 0.8 1.00 50 0.5538716 0.1791261
## 0.3 2 0.8 1.00 100 0.5626151 0.2025712
## 0.3 2 0.8 1.00 150 0.5692090 0.2129927
## 0.3 3 0.6 0.50 50 0.5847162 0.2338449
## 0.3 3 0.6 0.50 100 0.5957536 0.2679650
## 0.3 3 0.6 0.50 150 0.6067675 0.2920283
## 0.3 3 0.6 0.75 50 0.5846669 0.2343243
## 0.3 3 0.6 0.75 100 0.5802952 0.2253302
## 0.3 3 0.6 0.75 150 0.5759234 0.2238336
## 0.3 3 0.6 1.00 50 0.5495010 0.1672815
## 0.3 3 0.6 1.00 100 0.5714795 0.2118475
## 0.3 3 0.6 1.00 150 0.5626878 0.2007798
## 0.3 3 0.8 0.50 50 0.5562160 0.1821684
## 0.3 3 0.8 0.50 100 0.5605877 0.1972578
## 0.3 3 0.8 0.50 150 0.5847651 0.2467598
## 0.3 3 0.8 0.75 50 0.5518188 0.1673963
## 0.3 3 0.8 0.75 100 0.5627356 0.1969223
## 0.3 3 0.8 0.75 150 0.5694017 0.2138141
## 0.3 3 0.8 1.00 50 0.5713580 0.2070527
## 0.3 3 0.8 1.00 100 0.5691841 0.2075598
## 0.3 3 0.8 1.00 150 0.5758751 0.2255635
## 0.4 1 0.6 0.50 50 0.5208791 0.1346972
## 0.4 1 0.6 0.50 100 0.5472777 0.1970138
## 0.4 1 0.6 0.50 150 0.5583883 0.2176027
## 0.4 1 0.6 0.75 50 0.5341381 0.1567189
## 0.4 1 0.6 0.75 100 0.5890864 0.2648557
## 0.4 1 0.6 0.75 150 0.5781207 0.2488912
## 0.4 1 0.6 1.00 50 0.5497431 0.1686314
## 0.4 1 0.6 1.00 100 0.5562133 0.1915082
## 0.4 1 0.6 1.00 150 0.5584116 0.2032226
## 0.4 1 0.8 0.50 50 0.5496698 0.1764031
## 0.4 1 0.8 0.50 100 0.5496210 0.1921672
## 0.4 1 0.8 0.50 150 0.5648851 0.2291330
## 0.4 1 0.8 0.75 50 0.5321357 0.1487241
## 0.4 1 0.8 0.75 100 0.5561432 0.2063757
## 0.4 1 0.8 0.75 150 0.5759468 0.2431895
## 0.4 1 0.8 1.00 50 0.5431491 0.1528527
## 0.4 1 0.8 1.00 100 0.5649578 0.2090746
## 0.4 1 0.8 1.00 150 0.5605855 0.2123300
## 0.4 2 0.6 0.50 50 0.5824208 0.2476564
## 0.4 2 0.6 0.50 100 0.5736773 0.2338540
## 0.4 2 0.6 0.50 150 0.5758996 0.2444243
## 0.4 2 0.6 0.75 50 0.5670595 0.2122257
## 0.4 2 0.6 0.75 100 0.5759956 0.2289802
## 0.4 2 0.6 0.75 150 0.5671317 0.2195227
## 0.4 2 0.6 1.00 50 0.5715284 0.2208683
## 0.4 2 0.6 1.00 100 0.5803180 0.2417389
## 0.4 2 0.6 1.00 150 0.5912104 0.2650706
## 0.4 2 0.8 0.50 50 0.5648622 0.2165493
## 0.4 2 0.8 0.50 100 0.5672289 0.2242254
## 0.4 2 0.8 0.50 150 0.5715517 0.2349181
## 0.4 2 0.8 0.75 50 0.5670101 0.2117268
## 0.4 2 0.8 0.75 100 0.5713320 0.2328243
## 0.4 2 0.8 0.75 150 0.5758003 0.2414781
## 0.4 2 0.8 1.00 50 0.5560949 0.1944241
## 0.4 2 0.8 1.00 100 0.5758518 0.2270638
## 0.4 2 0.8 1.00 150 0.5869618 0.2473621
## 0.4 3 0.6 0.50 50 0.5979997 0.2701671
## 0.4 3 0.6 0.50 100 0.5980480 0.2779443
## 0.4 3 0.6 0.50 150 0.6001736 0.2828485
## 0.4 3 0.6 0.75 50 0.5757546 0.2302640
## 0.4 3 0.6 0.75 100 0.5736285 0.2246279
## 0.4 3 0.6 0.75 150 0.5781446 0.2352394
## 0.4 3 0.6 1.00 50 0.5559473 0.1840425
## 0.4 3 0.6 1.00 100 0.5605617 0.1996699
## 0.4 3 0.6 1.00 150 0.5648845 0.2113145
## 0.4 3 0.8 0.50 50 0.5759734 0.2321133
## 0.4 3 0.8 0.50 100 0.5957063 0.2650687
## 0.4 3 0.8 0.50 150 0.5957058 0.2652292
## 0.4 3 0.8 0.75 50 0.5715517 0.2200919
## 0.4 3 0.8 0.75 100 0.5979976 0.2700231
## 0.4 3 0.8 0.75 150 0.5870080 0.2551672
## 0.4 3 0.8 1.00 50 0.5298890 0.1345156
## 0.4 3 0.8 1.00 100 0.5298407 0.1369925
## 0.4 3 0.8 1.00 150 0.5451776 0.1675295
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
## 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma =
## 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5686429
modelTrain_mean_accuracy_cv_xgb <- mean_accuracy_xgb_model
print(modelTrain_mean_accuracy_cv_xgb)
## [1] 0.5686429
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
modelTrain_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", modelTrain_xgb_trainAccuracy))
## [1] "Training Accuracy: 1"
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_modelTrain_xgb <- caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_modelTrain_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 25 4 18
## Dementia 4 8 1
## MCI 37 16 80
##
## Overall Statistics
##
## Accuracy : 0.5855
## 95% CI : (0.5125, 0.6558)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 0.0256922
##
## Kappa : 0.2511
##
## Mcnemar's Test P-Value : 0.0001868
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.3788 0.28571 0.8081
## Specificity 0.8268 0.96970 0.4362
## Pos Pred Value 0.5319 0.61538 0.6015
## Neg Pred Value 0.7192 0.88889 0.6833
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.1295 0.04145 0.4145
## Detection Prevalence 0.2435 0.06736 0.6891
## Balanced Accuracy 0.6028 0.62771 0.6221
cm_modelTrain_xgb_Accuracy <- cm_modelTrain_xgb$overall["Accuracy"]
cm_modelTrain_xgb_Kappa <- cm_modelTrain_xgb$overall["Kappa"]
print(cm_modelTrain_xgb_Accuracy)
## Accuracy
## 0.5854922
print(cm_modelTrain_xgb_Kappa)
## Kappa
## 0.2510671
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 155)
##
## Overall
## age.now 100.00
## cg05096415 58.67
## cg15501526 53.97
## cg00962106 52.79
## cg16652920 51.84
## cg14564293 50.39
## cg06864789 50.38
## cg25259265 49.28
## cg04412904 48.22
## cg08857872 46.76
## cg09584650 45.71
## cg01921484 44.85
## cg01128042 42.34
## cg16771215 42.19
## cg02621446 41.35
## cg02981548 40.94
## cg15865722 38.31
## cg03327352 37.93
## cg26948066 37.59
## cg02494911 36.54
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: age.now 0.0277964662 0.0305758717 0.014948454 0.0277964662
## 2: cg05096415 0.0165271264 0.0151987640 0.009793814 0.0165271264
## 3: cg15501526 0.0152436978 0.0079960962 0.008762887 0.0152436978
## 4: cg00962106 0.0149232198 0.0145429395 0.009793814 0.0149232198
## 5: cg16652920 0.0146648709 0.0104445177 0.007216495 0.0146648709
## ---
## 151: cg04664583 0.0012122062 0.0005288135 0.001030928 0.0012122062
## 152: cg06112204 0.0010745149 0.0024687596 0.004123711 0.0010745149
## 153: cg20678988 0.0010299784 0.0024307498 0.003092784 0.0010299784
## 154: cg07480176 0.0007840553 0.0011485574 0.002577320 0.0007840553
## 155: cg27452255 0.0005271072 0.0008672921 0.002577320 0.0005271072
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.7177
## The AUC value for class CN is: 0.7177285
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.7652
## The AUC value for class Dementia is: 0.7651515
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.7245
## The AUC value for class MCI is: 0.7244788
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_xgb_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.7357863
print(modelTrain_xgb_AUC)
## [1] 0.7357863
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.5363147 0.05522911
## 78 0.5604672 0.13791728
## 155 0.5451298 0.10733955
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 78.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
modelTrain_mean_accuracy_cv_rf <- mean_accuracy_rf_model
print(modelTrain_mean_accuracy_cv_rf)
## [1] 0.5473039
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
modelTrain_rf_trainAccuracy <- train_accuracy
print(modelTrain_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_modelTrain_rf <- caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_modelTrain_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 19 7 8
## Dementia 0 0 0
## MCI 47 21 91
##
## Overall Statistics
##
## Accuracy : 0.5699
## 95% CI : (0.4969, 0.6408)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 0.06504
##
## Kappa : 0.1684
##
## Mcnemar's Test P-Value : 4.978e-12
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.28788 0.0000 0.9192
## Specificity 0.88189 1.0000 0.2766
## Pos Pred Value 0.55882 NaN 0.5723
## Neg Pred Value 0.70440 0.8549 0.7647
## Prevalence 0.34197 0.1451 0.5130
## Detection Rate 0.09845 0.0000 0.4715
## Detection Prevalence 0.17617 0.0000 0.8238
## Balanced Accuracy 0.58488 0.5000 0.5979
cm_modelTrain_rf_Accuracy <- cm_modelTrain_rf$overall["Accuracy"]
cm_modelTrain_rf_Kappa <- cm_modelTrain_rf$overall["Kappa"]
print(cm_modelTrain_rf_Accuracy)
## Accuracy
## 0.5699482
print(cm_modelTrain_rf_Kappa)
## Kappa
## 0.1684489
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## cg15501526 51.3208 12.11 100.000
## cg01153376 10.9139 54.97 78.593
## cg08857872 48.2026 34.71 65.332
## cg12279734 37.8346 63.84 37.870
## cg06864789 28.4816 58.20 19.875
## cg00962106 45.6839 35.98 57.926
## cg23658987 57.5291 20.66 30.886
## age.now 33.4227 48.57 56.948
## cg16652920 13.5100 31.95 56.438
## cg01921484 33.6159 19.10 54.046
## cg14293999 20.8676 18.92 53.149
## cg25259265 29.4613 51.43 52.515
## cg02494911 0.7986 37.06 52.461
## cg05570109 11.5290 42.89 51.497
## cg21209485 24.8994 51.40 19.205
## cg16579946 25.7024 23.83 49.530
## cg14710850 25.4326 16.41 49.017
## cg17186592 31.5157 49.02 40.990
## cg14924512 19.1858 48.90 34.370
## cg07523188 48.7428 31.32 9.632
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 51.3207894 12.108173 100.000000 cg15501526 100.00000
## 2 10.9138752 54.970568 78.592773 cg01153376 78.59277
## 3 48.2026308 34.713172 65.331798 cg08857872 65.33180
## 4 37.8346214 63.835427 37.870293 cg12279734 63.83543
## 5 28.4815869 58.199940 19.874651 cg06864789 58.19994
## 6 45.6839260 35.980721 57.926383 cg00962106 57.92638
## 7 57.5290605 20.659600 30.885574 cg23658987 57.52906
## 8 33.4227207 48.570461 56.947839 age.now 56.94784
## 9 13.5100360 31.948580 56.438010 cg16652920 56.43801
## 10 33.6159012 19.104834 54.046479 cg01921484 54.04648
## 11 20.8675761 18.919852 53.148657 cg14293999 53.14866
## 12 29.4613424 51.431625 52.515092 cg25259265 52.51509
## 13 0.7986121 37.064166 52.461334 cg02494911 52.46133
## 14 11.5289680 42.894354 51.496632 cg05570109 51.49663
## 15 24.8993598 51.404685 19.204564 cg21209485 51.40469
## 16 25.7024262 23.834731 49.529940 cg16579946 49.52994
## 17 25.4326260 16.410955 49.017250 cg14710850 49.01725
## 18 31.5157046 49.015720 40.989937 cg17186592 49.01572
## 19 19.1857976 48.896195 34.370457 cg14924512 48.89619
## 20 48.7427629 31.319425 9.631507 cg07523188 48.74276
## 21 40.6221994 48.712596 34.953845 cg27639199 48.71260
## 22 48.0206414 25.494056 28.354834 cg11133939 48.02064
## 23 47.5935714 36.219234 37.732426 cg00154902 47.59357
## 24 38.9462725 26.351441 47.583103 cg27086157 47.58310
## 25 18.1683746 32.064900 47.507694 cg11331837 47.50769
## 26 30.6350882 44.589577 47.479350 cg04664583 47.47935
## 27 31.3935616 43.522626 47.450571 cg02621446 47.45057
## 28 47.0986275 39.596778 31.432181 cg16771215 47.09863
## 29 23.4664486 33.055876 47.053678 cg10738648 47.05368
## 30 35.6246443 34.742753 46.714868 cg09854620 46.71487
## 31 13.0742870 46.562383 29.908991 cg23916408 46.56238
## 32 46.3269131 42.570326 17.443387 cg25561557 46.32691
## 33 46.2954606 27.241196 29.628992 cg23432430 46.29546
## 34 45.8511559 24.268446 39.531834 cg10240127 45.85116
## 35 35.3034617 34.279440 45.628304 cg03084184 45.62830
## 36 20.4385245 32.574263 45.534112 cg12228670 45.53411
## 37 22.5847973 23.657220 45.377279 cg20370184 45.37728
## 38 44.9736608 34.789021 18.313085 cg27577781 44.97366
## 39 44.1773684 24.876401 17.122350 cg10369879 44.17737
## 40 29.0142791 22.200580 44.059881 cg06118351 44.05988
## 41 33.6375593 44.033332 36.441838 cg14564293 44.03333
## 42 31.8262561 43.913244 25.732220 cg24859648 43.91324
## 43 18.8419132 43.856618 18.311267 cg27341708 43.85662
## 44 20.0321996 43.647524 31.312070 cg05096415 43.64752
## 45 38.6026083 20.420003 43.564278 cg04412904 43.56428
## 46 26.9174668 43.147902 19.879603 cg18339359 43.14790
## 47 25.3216670 43.116395 30.472543 cg00322003 43.11639
## 48 37.7708111 30.814926 42.181775 cg10985055 42.18178
## 49 42.0129171 27.265964 37.279751 cg05234269 42.01292
## 50 19.9269499 41.980414 23.295586 cg12534577 41.98041
## 51 20.8633676 41.974223 31.281430 cg00999469 41.97422
## 52 29.0897665 41.971921 34.320613 cg16178271 41.97192
## 53 41.9470664 28.823836 20.424692 cg05321907 41.94707
## 54 29.8582156 41.845388 32.224005 cg26948066 41.84539
## 55 39.4537402 35.363548 41.742264 cg17429539 41.74226
## 56 41.6746653 36.721771 17.974660 cg13885788 41.67467
## 57 16.3528060 27.559903 41.529725 cg01549082 41.52973
## 58 27.8506003 11.269979 41.440149 cg01013522 41.44015
## 59 34.7582085 15.224071 41.177109 cg12466610 41.17711
## 60 40.7441049 28.379776 12.261373 cg01680303 40.74410
## 61 23.2554475 40.716489 19.137615 cg06697310 40.71649
## 62 40.4904076 24.211959 25.412176 cg01667144 40.49041
## 63 28.2016064 40.319173 16.862264 cg20913114 40.31917
## 64 9.0074309 40.042779 33.004879 cg02320265 40.04278
## 65 24.9235624 25.915937 39.999881 cg24873924 39.99988
## 66 28.4980181 9.808450 39.957892 cg17906851 39.95789
## 67 39.7728348 28.934544 29.411513 cg03327352 39.77283
## 68 35.3654197 10.561828 39.584403 cg14240646 39.58440
## 69 30.3272001 13.437584 39.492560 cg27272246 39.49256
## 70 35.9409198 39.388577 38.182389 cg02225060 39.38858
## 71 39.3558964 31.408248 19.442412 cg09584650 39.35590
## 72 13.2425645 39.327065 36.842720 cg00247094 39.32707
## 73 32.6987423 37.479267 39.315843 cg01128042 39.31584
## 74 8.6379945 18.821716 39.237911 cg15535896 39.23791
## 75 39.2312131 23.803327 33.203209 cg11187460 39.23121
## 76 39.1274904 20.618820 26.386871 cg24506579 39.12749
## 77 24.2227155 31.155447 38.983723 cg02981548 38.98372
## 78 38.8417423 34.093954 37.621267 cg26757229 38.84174
## 79 38.4691112 16.234568 35.175021 PC2 38.46911
## 80 38.3379213 25.516240 14.240647 cg20507276 38.33792
## 81 32.5200267 33.419767 38.268589 cg15775217 38.26859
## 82 32.6932126 5.572029 38.205181 cg12146221 38.20518
## 83 36.7978241 38.190693 36.347623 cg07028768 38.19069
## 84 32.8869773 16.839476 38.161479 PC3 38.16148
## 85 7.6831196 37.792686 23.243355 cg03982462 37.79269
## 86 25.0462440 26.303589 37.563028 cg02372404 37.56303
## 87 20.2453361 37.306923 8.526062 cg23161429 37.30692
## 88 31.2255588 30.121119 37.195451 cg19512141 37.19545
## 89 33.0451528 18.246981 37.187715 cg06715136 37.18771
## 90 32.7260270 33.631567 37.171932 cg17421046 37.17193
## 91 29.0856804 20.970543 36.793133 cg12284872 36.79313
## 92 24.2592202 36.445780 32.816724 cg12682323 36.44578
## 93 24.6874839 14.739186 36.422434 cg25879395 36.42243
## 94 36.0798290 29.909161 0.000000 cg06950937 36.07983
## 95 30.9845240 25.130094 35.964877 cg26219488 35.96488
## 96 29.7106058 26.260707 35.453553 cg27452255 35.45355
## 97 29.7340080 35.290011 33.036275 cg00616572 35.29001
## 98 9.3209127 27.239359 35.088114 cg14527649 35.08811
## 99 25.3936604 34.696248 15.141832 cg18819889 34.69625
## 100 34.6732058 26.388891 12.211602 cg07152869 34.67321
## 101 32.4918424 14.284514 34.474142 cg08198851 34.47414
## 102 33.3990786 34.361560 22.016766 cg00689685 34.36156
## 103 23.6005807 34.304993 30.468770 cg00675157 34.30499
## 104 33.6208352 29.434168 34.244628 cg14307563 34.24463
## 105 18.0087674 33.891664 16.710074 cg07480176 33.89166
## 106 29.3104005 32.346665 33.615513 cg24861747 33.61551
## 107 26.8480472 24.509324 33.615241 cg01933473 33.61524
## 108 30.7311397 28.756002 33.587076 cg26069044 33.58708
## 109 33.5837621 18.489991 27.013304 cg11247378 33.58376
## 110 9.8866482 33.445459 27.570780 cg03071582 33.44546
## 111 21.6458703 31.711592 33.324979 cg19377607 33.32498
## 112 21.3482102 21.859410 33.231961 cg03088219 33.23196
## 113 30.9487918 15.495092 32.814968 cg20685672 32.81497
## 114 32.5626185 8.111475 27.626780 cg25758034 32.56262
## 115 32.4465565 23.319557 22.746918 cg06112204 32.44656
## 116 29.2607042 12.499356 32.434410 cg08861434 32.43441
## 117 26.5397967 32.336843 30.530235 cg16788319 32.33684
## 118 23.2864860 31.787986 32.316051 cg00696044 32.31605
## 119 11.8854076 25.029164 32.199893 cg12784167 32.19989
## 120 18.7416463 13.572058 32.182320 cg08779649 32.18232
## 121 25.3778152 31.987940 26.855880 cg12738248 31.98794
## 122 30.3577137 18.983288 31.966732 cg21697769 31.96673
## 123 31.6246521 26.850609 31.747640 cg16715186 31.74764
## 124 23.2012161 31.453217 29.395608 cg18821122 31.45322
## 125 31.2180154 21.950730 12.433383 cg15633912 31.21802
## 126 27.7131591 30.275814 31.034618 cg04248279 31.03462
## 127 30.9693924 26.006295 29.916371 PC1 30.96939
## 128 30.5366925 22.831281 12.174570 cg03129555 30.53669
## 129 25.9166007 23.145547 30.504210 cg15865722 30.50421
## 130 26.9351172 30.327838 26.409878 cg03660162 30.32784
## 131 16.3975287 30.173891 18.247525 cg26474732 30.17389
## 132 29.9984974 21.498342 24.947071 cg06378561 29.99850
## 133 29.9174258 17.304786 27.198307 cg13080267 29.91743
## 134 26.2555342 29.565324 19.362089 cg17738613 29.56532
## 135 11.0786764 28.941198 16.463878 cg22274273 28.94120
## 136 24.7057168 28.302285 27.745206 cg00272795 28.30229
## 137 20.9562058 23.838892 28.153301 cg16749614 28.15330
## 138 15.4188235 27.981804 18.241912 cg02932958 27.98180
## 139 23.3813797 27.669491 13.993777 cg08584917 27.66949
## 140 22.0653600 9.948338 27.607669 cg01413796 27.60767
## 141 25.2568598 23.157348 27.477514 cg10750306 27.47751
## 142 17.8059395 27.170186 22.531944 cg21854924 27.17019
## 143 27.0883280 24.964319 16.397580 cg12012426 27.08833
## 144 23.4371413 26.584314 21.675006 cg07138269 26.58431
## 145 25.9385491 25.725706 13.733208 cg19503462 25.93855
## 146 10.0013844 25.606045 17.249206 cg12776173 25.60605
## 147 25.3789296 22.629852 1.361451 cg02356645 25.37893
## 148 21.1725288 24.971799 16.752787 cg06536614 24.97180
## 149 20.8261680 24.426003 16.148522 cg11227702 24.42600
## 150 15.1961035 22.975841 24.257491 cg20139683 24.25749
## 151 22.1006728 20.385236 24.018683 cg11438323 24.01868
## 152 21.3477705 13.788454 23.361039 cg24851651 23.36104
## 153 15.1127270 22.916516 20.702346 cg20678988 22.91652
## 154 16.3609230 22.612790 16.644260 cg05841700 22.61279
## 155 10.2231141 21.190076 17.219737 cg25436480 21.19008
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 51.3207894 12.10817 100.000000 cg15501526 100.00000
## 2 10.9138752 54.97057 78.592773 cg01153376 78.59277
## 3 48.2026308 34.71317 65.331798 cg08857872 65.33180
## 4 37.8346214 63.83543 37.870293 cg12279734 63.83543
## 5 28.4815869 58.19994 19.874651 cg06864789 58.19994
## 6 45.6839260 35.98072 57.926383 cg00962106 57.92638
## 7 57.5290605 20.65960 30.885574 cg23658987 57.52906
## 8 33.4227207 48.57046 56.947839 age.now 56.94784
## 9 13.5100360 31.94858 56.438010 cg16652920 56.43801
## 10 33.6159012 19.10483 54.046479 cg01921484 54.04648
## 11 20.8675761 18.91985 53.148657 cg14293999 53.14866
## 12 29.4613424 51.43162 52.515092 cg25259265 52.51509
## 13 0.7986121 37.06417 52.461334 cg02494911 52.46133
## 14 11.5289680 42.89435 51.496632 cg05570109 51.49663
## 15 24.8993598 51.40469 19.204564 cg21209485 51.40469
## 16 25.7024262 23.83473 49.529940 cg16579946 49.52994
## 17 25.4326260 16.41095 49.017250 cg14710850 49.01725
## 18 31.5157046 49.01572 40.989937 cg17186592 49.01572
## 19 19.1857976 48.89619 34.370457 cg14924512 48.89619
## 20 48.7427629 31.31943 9.631507 cg07523188 48.74276
## [1] "the top 20 features based on max way:"
## [1] "cg15501526" "cg01153376" "cg08857872" "cg12279734" "cg06864789" "cg00962106" "cg23658987"
## [8] "age.now" "cg16652920" "cg01921484" "cg14293999" "cg25259265" "cg02494911" "cg05570109"
## [15] "cg21209485" "cg16579946" "cg14710850" "cg17186592" "cg14924512" "cg07523188"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6845
## The AUC value for class CN is: 0.6845025
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6235
## The AUC value for class Dementia is: 0.6234848
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6347
## The AUC value for class MCI is: 0.634698
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_rf_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6475618
print(modelTrain_rf_AUC)
## [1] 0.6475618
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 364, 364, 365, 363
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.7142963 0.5300926
## 0.50 0.7142719 0.5287505
## 1.00 0.7119525 0.5147656
##
## Tuning parameter 'sigma' was held constant at a value of 0.003301995
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003301995 and C = 0.25.
print(svm_model$bestTune)
## sigma C
## 1 0.003301995 0.25
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.7135069
modelTrain_mean_accuracy_cv_svm <- mean_accuracy_svm_model
print(modelTrain_mean_accuracy_cv_svm)
## [1] 0.7135069
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.938461538461538"
modelTrain_svm_trainAccuracy <-train_accuracy
print(modelTrain_svm_trainAccuracy)
## [1] 0.9384615
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_modelTrain_svm <- caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_modelTrain_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 51 5 31
## Dementia 2 18 9
## MCI 13 5 59
##
## Overall Statistics
##
## Accuracy : 0.6632
## 95% CI : (0.5918, 0.7295)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 1.708e-05
##
## Kappa : 0.4563
##
## Mcnemar's Test P-Value : 0.02042
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.7727 0.64286 0.5960
## Specificity 0.7165 0.93333 0.8085
## Pos Pred Value 0.5862 0.62069 0.7662
## Neg Pred Value 0.8585 0.93902 0.6552
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2642 0.09326 0.3057
## Detection Prevalence 0.4508 0.15026 0.3990
## Balanced Accuracy 0.7446 0.78810 0.7022
cm_modelTrain_svm_Accuracy <- cm_modelTrain_svm$overall["Accuracy"]
cm_modelTrain_svm_Kappa <- cm_modelTrain_svm$overall["Kappa"]
print(cm_modelTrain_svm_Accuracy)
## Accuracy
## 0.6632124
print(cm_modelTrain_svm_Kappa)
## Kappa
## 0.4562673
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg25879395 1.047312 1.075269 1.094624 0.1543210
## 2 age.now 1.017204 1.053763 1.075269 0.1512346
## 3 cg00999469 1.032258 1.043011 1.051613 0.1496914
## 4 cg26069044 1.025806 1.043011 1.062366 0.1496914
## 5 cg05096415 1.021505 1.043011 1.062366 0.1496914
## 6 cg01921484 0.972043 1.043011 1.053763 0.1496914
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4|| METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.5126
## The AUC value for class CN is: 0.5126461
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6271
## The AUC value for class Dementia is: 0.6270563
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.5456
## The AUC value for class MCI is: 0.545562
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_svm_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.5617548
# GOTO "INPUT" Session to set the Number of common features needed
NUM_COMMON_FEATURES <- NUM_COMMON_FEATURES_SET
The feature importance may not combined directly, since they are not all within the same measure, for example, the SVM model is use other method for feature importance.
So, let’s considering scale the feature to make them in the same range.
First, Let’s process with each data frame to ensure they have consistent format.
if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
# Process the dataframe to ensure they have consistent format.
# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"
head(importance_SVM_df_processed)
# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
importance_model_LRM1_df_processed$Feature<-rownames(importance_model_LRM1_df_processed)
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "Overall"] <- "Importance_LRM1"
head(importance_model_LRM1_df_processed)
# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed$Feature<-rownames(importance_elastic_net_model1_df_processed)
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "Overall"] <- "Importance_ENM1"
head(importance_elastic_net_model1_df_processed)
# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"
head(importance_xgb_model_df_processed)
# RF
importance_rf_model_df_processed <- importance_rf_model_df
if (METHOD_FEATURE_FLAG_NUM == 3){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(CI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 4){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 5){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 6){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, Dementia))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
head(importance_rf_model_df_processed)
}
From above (binary case), we could ensure they have same data frame structure with same column names, ‘Importance’ and ‘feature’ in order.
If our case is the multiclass classification, see the below. Except XGBoost model and SVM model, the features importance of each model are computed by the max importance among the classes.
if(METHOD_FEATURE_FLAG == 1){
# Process the dataframe to ensure they have consistent format.
# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"
head(importance_SVM_df_processed)
# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "MaxImportance"] <- "Importance_LRM1"
importance_model_LRM1_df_processed <- subset(importance_model_LRM1_df_processed, select = -c(Dementia,MCI, CN))
head(importance_model_LRM1_df_processed)
# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed <- subset(importance_elastic_net_model1_df_processed, select = -c(Dementia,MCI, CN))
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "MaxImportance"] <- "Importance_ENM1"
head(importance_elastic_net_model1_df_processed)
# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"
head(importance_xgb_model_df_processed)
# RF
importance_rf_model_df_processed <- importance_rf_model_df
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia,MCI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "MaxImportance"] <- "Importance_RF"
head(importance_rf_model_df_processed)
}
Then, Let’s do scaling, here we choose min-max scaling.
importance_list <- list(logistic = importance_model_LRM1_df_processed,
xgb = importance_xgb_model_df_processed,
elastic_net = importance_elastic_net_model1_df_processed,
rf = importance_rf_model_df_processed,
svm = importance_SVM_df_processed)
min_max_scale_Imp<-function(df){
x<-df[, grepl("Importance_", colnames(df))]
df[, grepl("Importance_", colnames(df))] <- (x - min(x)) / (max(x) - min(x))
return(df)
}
for (i in seq_along(importance_list)) {
importance_list[[i]] <- min_max_scale_Imp(importance_list[[i]])
}
# Print each data frame after scaling
print(head(importance_list[[1]]))
## Feature Importance_LRM1
## 1 PC1 1.0000000
## 2 PC2 0.7857178
## 3 PC3 0.6786379
## 4 cg00962106 0.6281880
## 5 cg02225060 0.5084410
## 6 cg14710850 0.4928898
print(head(importance_list[[2]]))
## Importance_XGB Feature
## age.now 1.0000000 age.now
## cg05096415 0.5867398 cg05096415
## cg15501526 0.5396750 cg15501526
## cg00962106 0.5279227 cg00962106
## cg16652920 0.5184487 cg16652920
## cg14564293 0.5039124 cg14564293
print(head(importance_list[[3]]))
## Feature Importance_ENM1
## 1 PC1 1.0000000
## 2 PC2 0.8852268
## 3 cg00962106 0.7275966
## 4 cg02225060 0.6173652
## 5 cg02981548 0.5869110
## 6 cg23432430 0.5696065
print(head(importance_list[[4]]))
## Feature Importance_RF
## 1 cg15501526 1.0000000
## 2 cg01153376 0.7283689
## 3 cg08857872 0.5601036
## 4 cg12279734 0.5411165
## 5 cg06864789 0.4696092
## 6 cg00962106 0.4661381
print(head(importance_list[[5]]))
## Importance_SVM Feature
## 1 1.0000000 cg25879395
## 2 0.8333333 age.now
## 3 0.7500000 cg00999469
## 4 0.7500000 cg26069044
## 5 0.7500000 cg05096415
## 6 0.7500000 cg01921484
Now, Let’s merge the data frames of scaled feature importance.
# Merge all importances
combined_importance <- Reduce(function(x, y) merge(x, y, by = "Feature", all = TRUE), importance_list)
head(combined_importance)
# Replace NA with 0
combined_importance[is.na(combined_importance)] <- 0
# Exclude DX, as it's label
combined_importance <- combined_importance %>%
filter(Feature != "DX")
# View the filtered dataframe
head(combined_importance)
If select the TOP Number of important features based on average importance. (See the following)
combined_importance_AVF <- combined_importance
# Calculate average importance
combined_importance_AVF$Average_Importance <- rowMeans(combined_importance_AVF[,-1])
head(combined_importance_AVF)
combined_importance_Avg_ordered <- combined_importance_AVF[order(-combined_importance_AVF$Average_Importance),]
head(combined_importance_Avg_ordered)
# Top Number of common important features
print("the Top number of common features here is set to:")
## [1] "the Top number of common features here is set to:"
print(NUM_COMMON_FEATURES)
## [1] 20
top_Num_combined_importance_Avg_ordered <- head(combined_importance_Avg_ordered,n = NUM_COMMON_FEATURES)
print(top_Num_combined_importance_Avg_ordered)
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 153 PC1 1.00000000 0.1866056 1.0000000 0.1240874 0.6666667
## 10 cg00962106 0.62818798 0.5279227 0.7275966 0.4661381 0.2500000
## 154 PC2 0.78571784 0.2487008 0.8852268 0.2192495 0.4166667
## 39 cg05096415 0.44533457 0.5867398 0.4148286 0.2849571 0.7500000
## 60 cg08857872 0.38088074 0.4675933 0.5311399 0.5601036 0.4166667
## 129 cg23432430 0.43874902 0.2739420 0.5696065 0.3185561 0.7500000
## 102 cg16652920 0.34495856 0.5184487 0.5211088 0.4472525 0.5000000
## 50 cg06864789 0.36409272 0.5038359 0.4605866 0.4696092 0.5000000
## 1 age.now 0.00000000 1.0000000 0.0000000 0.4537216 0.8333333
## 19 cg01921484 0.23359943 0.4484551 0.3846605 0.4169069 0.7500000
## 146 cg26948066 0.32904957 0.3759000 0.5078325 0.2620902 0.7500000
## 107 cg17186592 0.41619685 0.3415588 0.4285680 0.3530728 0.6666667
## 62 cg09584650 0.41042125 0.4571129 0.4770672 0.2305017 0.5833333
## 78 cg12279734 0.30333382 0.3066646 0.3294303 0.5411165 0.6666667
## 28 cg02981548 0.48692573 0.4094430 0.5869110 0.2257793 0.4166667
## 93 cg14710850 0.49288980 0.2709836 0.5415595 0.3530923 0.4166667
## 155 PC3 0.67863786 0.1650015 0.5045821 0.2153460 0.5000000
## 54 cg07152869 0.46401454 0.2123634 0.5392183 0.1710842 0.6666667
## 61 cg08861434 0.48302924 0.2358769 0.4931322 0.1426766 0.6666667
## 95 cg15501526 0.05232237 0.5396750 0.1653714 1.0000000 0.2500000
## Average_Importance
## 153 0.5954719
## 10 0.5199691
## 154 0.5111123
## 39 0.4963720
## 60 0.4712769
## 129 0.4701707
## 102 0.4663537
## 50 0.4596249
## 1 0.4574110
## 19 0.4467244
## 146 0.4449745
## 107 0.4412126
## 62 0.4316873
## 78 0.4294424
## 28 0.4251451
## 93 0.4150384
## 155 0.4127135
## 54 0.4106694
## 61 0.4042763
## 95 0.4014738
# Top Number of common important features' name
top_Num_combined_importance_Avg_ordered_Nam <- top_Num_combined_importance_Avg_ordered$Feature
print(top_Num_combined_importance_Avg_ordered_Nam)
## [1] "PC1" "cg00962106" "PC2" "cg05096415" "cg08857872" "cg23432430" "cg16652920"
## [8] "cg06864789" "age.now" "cg01921484" "cg26948066" "cg17186592" "cg09584650" "cg12279734"
## [15] "cg02981548" "cg14710850" "PC3" "cg07152869" "cg08861434" "cg15501526"
Visualization with bar plot for the feature average importance
ggplot(combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip coordinates to make it horizontal
labs(title = "Feature Importance Sorted by Average Value",
x = "Feature",
y = "Average Importance") +
theme_minimal()
Visualization with bar plot for the top feature average importance
ggplot(top_Num_combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Average Value"),
x = "Feature",
y = "Average Importance") +
theme_minimal()
The following will show, If we select the TOP Number of important features based on specific quantile importance. ( Here we choose to use median i.e 50% quantile)
Let’s create the new data frame with different quantiles of feature importance for each models.
And order by the 50% quantile from high to low, select top features based on that.
quantiles <- t(apply(combined_importance[,-1], 1, function(x) quantile(x, probs = c(0,0.25, 0.5, 0.75,1))))
combined_importance_quantiles <- cbind(Feature = combined_importance$Feature, quantiles)
combined_importance_quantiles <- as.data.frame(combined_importance_quantiles)
combined_importance_quantiles$`50%` <- as.numeric(combined_importance_quantiles$`50%`)
combined_importance_quantiles$`0%` <- as.numeric(combined_importance_quantiles$`0%`)
combined_importance_quantiles$`25%` <- as.numeric(combined_importance_quantiles$`25%`)
combined_importance_quantiles$`75%` <- as.numeric(combined_importance_quantiles$`75%`)
combined_importance_quantiles$`100%` <- as.numeric(combined_importance_quantiles$`100%`)
# Sort by median importance (50th percentile)
combined_importance_quantiles <- combined_importance_quantiles[order(-combined_importance_quantiles$`50%`), ]
head(combined_importance_quantiles)
top_Num_median_features_imp <- head(combined_importance_quantiles,n = NUM_COMMON_FEATURES)
print(top_Num_median_features_imp)
## Feature 0% 25% 50% 75% 100%
## 153 PC1 0.12408737 0.18660559 0.6666667 1.0000000 1.0000000
## 10 cg00962106 0.25000000 0.46613808 0.5279227 0.6281880 0.7275966
## 102 cg16652920 0.34495856 0.44725248 0.5000000 0.5184487 0.5211088
## 155 PC3 0.16500148 0.21534602 0.5000000 0.5045821 0.6786379
## 150 cg27452255 0.00000000 0.18098580 0.4871655 0.4910563 0.5000000
## 61 cg08861434 0.14267663 0.23587688 0.4830292 0.4931322 0.6666667
## 50 cg06864789 0.36409272 0.46058663 0.4696092 0.5000000 0.5038359
## 60 cg08857872 0.38088074 0.41666667 0.4675933 0.5311399 0.5601036
## 54 cg07152869 0.17108416 0.21236341 0.4640145 0.5392183 0.6666667
## 62 cg09584650 0.23050169 0.41042125 0.4571129 0.4770672 0.5833333
## 104 cg16749614 0.03199478 0.08835466 0.4560318 0.5406656 0.5833333
## 1 age.now 0.00000000 0.00000000 0.4537216 0.8333333 1.0000000
## 39 cg05096415 0.28495711 0.41482863 0.4453346 0.5867398 0.7500000
## 129 cg23432430 0.27394198 0.31855613 0.4387490 0.5696065 0.7500000
## 19 cg01921484 0.23359943 0.38466055 0.4169069 0.4484551 0.7500000
## 21 cg02225060 0.18921852 0.23091637 0.4166667 0.5084410 0.6173652
## 28 cg02981548 0.22577927 0.40944300 0.4166667 0.4869257 0.5869110
## 93 cg14710850 0.27098363 0.35309226 0.4166667 0.4928898 0.5415595
## 116 cg19503462 0.06025222 0.19011324 0.4166667 0.4682402 0.4778176
## 154 PC2 0.21924948 0.24870077 0.4166667 0.7857178 0.8852268
top_Num_median_features_Name<-top_Num_median_features_imp$Feature
print(top_Num_median_features_Name)
## [1] "PC1" "cg00962106" "cg16652920" "PC3" "cg27452255" "cg08861434" "cg06864789"
## [8] "cg08857872" "cg07152869" "cg09584650" "cg16749614" "age.now" "cg05096415" "cg23432430"
## [15] "cg01921484" "cg02225060" "cg02981548" "cg14710850" "cg19503462" "PC2"
Visualization with the box plot.
library(tidyr)
long_df <- pivot_longer(combined_importance_quantiles,
cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
names_to = "Quantile",
values_to = "Importance")
ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
geom_boxplot() +
coord_flip() +
labs(title = "Distribution of Feature Importances",
x = "Feature",
y = "Importance") +
theme_minimal()
Visualization with top features with box plot.
library(tidyr)
long_df <- pivot_longer(top_Num_median_features_imp,
cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
names_to = "Quantile",
values_to = "Importance")
ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
geom_boxplot() +
coord_flip() +
labs(
title = paste("Distribution of Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Median Value"),
x = "Feature",
y = "Importance") +
theme_minimal()
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- NUM_COMMON_FEATURES_SET_Frequency
combined_importance_freq_ordered_df<-combined_importance_Avg_ordered
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature
# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature
# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature
# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature
# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))
models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models),
dimnames = list(all_features, models))
# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
feature_matrix[feature, "LRM"] <-
as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
feature_matrix[feature, "XGB"] <-
as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
feature_matrix[feature, "ENM"] <-
as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
feature_matrix[feature, "RF"] <-
as.integer(feature %in% top_impAvg_orderby_RF_NAME)
feature_matrix[feature, "SVM"] <-
as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}
feature_df <- as.data.frame(feature_matrix)
print(head(feature_df))
## LRM XGB ENM RF SVM
## PC1 1 0 1 0 1
## PC2 1 0 1 0 0
## PC3 1 0 1 0 0
## cg00962106 1 1 1 1 0
## cg02225060 1 0 1 0 0
## cg14710850 1 0 1 1 0
For quickly read, we calculate the time that the feature have been appeared, by calculated row sum and add the row sum column into our data frame.
feature_df$Total_Count <- rowSums(feature_df[,1:5])
feature_df <- feature_df[order(-feature_df$Total_Count), ]
frequency_feature_df_RAW_ordered<-feature_df
print(feature_df)
## LRM XGB ENM RF SVM Total_Count
## cg00962106 1 1 1 1 0 4
## PC1 1 0 1 0 1 3
## cg14710850 1 0 1 1 0 3
## cg02981548 1 1 1 0 0 3
## cg08861434 1 0 1 0 1 3
## cg07152869 1 0 1 0 1 3
## cg05096415 1 1 0 0 1 3
## cg23432430 1 0 1 0 1 3
## cg17186592 1 0 0 1 1 3
## cg09584650 1 1 1 0 0 3
## age.now 0 1 0 1 1 3
## cg16652920 0 1 1 1 0 3
## cg06864789 0 1 1 1 0 3
## cg08857872 0 1 1 1 0 3
## cg01921484 0 1 0 1 1 3
## cg26948066 0 1 1 0 1 3
## PC2 1 0 1 0 0 2
## PC3 1 0 1 0 0 2
## cg02225060 1 0 1 0 0 2
## cg27452255 1 0 1 0 0 2
## cg19503462 1 0 1 0 0 2
## cg16749614 1 0 1 0 0 2
## cg11133939 1 0 1 0 0 2
## cg15501526 0 1 0 1 0 2
## cg25259265 0 1 0 1 0 2
## cg01128042 0 1 0 0 1 2
## cg02494911 0 1 0 1 0 2
## cg12279734 0 0 0 1 1 2
## cg00247094 1 0 0 0 0 1
## cg16715186 1 0 0 0 0 1
## cg03129555 1 0 0 0 0 1
## cg14564293 0 1 0 0 0 1
## cg04412904 0 1 0 0 0 1
## cg16771215 0 1 0 0 0 1
## cg02621446 0 1 0 0 0 1
## cg15865722 0 1 0 0 0 1
## cg03327352 0 1 0 0 0 1
## cg02372404 0 0 1 0 0 1
## cg01153376 0 0 0 1 0 1
## cg23658987 0 0 0 1 0 1
## cg14293999 0 0 0 1 0 1
## cg05570109 0 0 0 1 0 1
## cg21209485 0 0 0 1 0 1
## cg16579946 0 0 0 1 0 1
## cg14924512 0 0 0 1 0 1
## cg07523188 0 0 0 1 0 1
## cg25879395 0 0 0 0 1 1
## cg26757229 0 0 0 0 1 1
## cg26069044 0 0 0 0 1 1
## cg00999469 0 0 0 0 1 1
## cg24861747 0 0 0 0 1 1
## cg01013522 0 0 0 0 1 1
## cg05234269 0 0 0 0 1 1
## cg00616572 0 0 0 0 1 1
## cg01680303 0 0 0 0 1 1
all_features <- union(combined_importance_freq_ordered_df$Feature, rownames(feature_df))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_df_full <- data.frame(Feature = all_features)
feature_df_full <- merge(feature_df_full, feature_df, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_df_full[is.na(feature_df_full)] <- 0
# For top_impAvg_ordered
all_impAvg_ordered_full <- data.frame(Feature = all_features)
all_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,all_impAvg_ordered_full, by.x = "Feature", by.y = "Feature", all.x = TRUE)
all_impAvg_ordered_full[is.na(all_impAvg_ordered_full)] <- 0
all_combined_df_impAvg <- merge(feature_df_full, all_impAvg_ordered_full, by = "Feature", all = TRUE)
print(head(feature_df_full))
## Feature LRM XGB ENM RF SVM Total_Count
## 1 age.now 0 1 0 1 1 3
## 2 cg00154902 0 0 0 0 0 0
## 3 cg00247094 1 0 0 0 0 1
## 4 cg00272795 0 0 0 0 0 0
## 5 cg00322003 0 0 0 0 0 0
## 6 cg00616572 0 0 0 0 1 1
print(head(all_impAvg_ordered_full))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 1 age.now 0.00000000 1.0000000 0.0000000 0.45372158 0.8333333
## 2 cg00154902 0.08879263 0.2688349 0.3713159 0.33502754 0.5833333
## 3 cg00247094 0.41278095 0.2185408 0.4245031 0.23013585 0.5833333
## 4 cg00272795 0.21295491 0.1985510 0.2309999 0.09024509 0.3333333
## 5 cg00322003 0.21752832 0.1465702 0.3430531 0.27821774 0.5833333
## 6 cg00616572 0.28381319 0.1715595 0.3572845 0.17891065 0.6666667
## Average_Importance
## 1 0.4574110
## 2 0.3294609
## 3 0.3738588
## 4 0.2132168
## 5 0.3137405
## 6 0.3316469
print(head(all_combined_df_impAvg))
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1
## 1 age.now 0 1 0 1 1 3 0.00000000 1.0000000 0.0000000
## 2 cg00154902 0 0 0 0 0 0 0.08879263 0.2688349 0.3713159
## 3 cg00247094 1 0 0 0 0 1 0.41278095 0.2185408 0.4245031
## 4 cg00272795 0 0 0 0 0 0 0.21295491 0.1985510 0.2309999
## 5 cg00322003 0 0 0 0 0 0 0.21752832 0.1465702 0.3430531
## 6 cg00616572 0 0 0 0 1 1 0.28381319 0.1715595 0.3572845
## Importance_RF Importance_SVM Average_Importance
## 1 0.45372158 0.8333333 0.4574110
## 2 0.33502754 0.5833333 0.3294609
## 3 0.23013585 0.5833333 0.3738588
## 4 0.09024509 0.3333333 0.2132168
## 5 0.27821774 0.5833333 0.3137405
## 6 0.17891065 0.6666667 0.3316469
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.
if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data[,c("DX",df_process_mutual_FeatureName)]
print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
if(METHOD_FEATURE_FLAG == 1){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data_m1[,c("DX",df_process_mutual_FeatureName)]
print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
## [1] "The number of final used features of common importance method: 16"
print(df_process_mutual_FeatureName)
## [1] "cg00962106" "PC1" "cg14710850" "cg02981548" "cg08861434" "cg07152869" "cg05096415"
## [8] "cg23432430" "cg17186592" "cg09584650" "age.now" "cg16652920" "cg06864789" "cg08857872"
## [15] "cg01921484" "cg26948066"
Importance of these features:
Top_Frequency_Feature_importance <- combined_importance_freq_ordered_df[
combined_importance_freq_ordered_df$Feature %in% df_process_mutual_FeatureName,
]
print(Top_Frequency_Feature_importance)
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 153 PC1 1.0000000 0.1866056 1.0000000 0.1240874 0.6666667
## 10 cg00962106 0.6281880 0.5279227 0.7275966 0.4661381 0.2500000
## 39 cg05096415 0.4453346 0.5867398 0.4148286 0.2849571 0.7500000
## 60 cg08857872 0.3808807 0.4675933 0.5311399 0.5601036 0.4166667
## 129 cg23432430 0.4387490 0.2739420 0.5696065 0.3185561 0.7500000
## 102 cg16652920 0.3449586 0.5184487 0.5211088 0.4472525 0.5000000
## 50 cg06864789 0.3640927 0.5038359 0.4605866 0.4696092 0.5000000
## 1 age.now 0.0000000 1.0000000 0.0000000 0.4537216 0.8333333
## 19 cg01921484 0.2335994 0.4484551 0.3846605 0.4169069 0.7500000
## 146 cg26948066 0.3290496 0.3759000 0.5078325 0.2620902 0.7500000
## 107 cg17186592 0.4161969 0.3415588 0.4285680 0.3530728 0.6666667
## 62 cg09584650 0.4104212 0.4571129 0.4770672 0.2305017 0.5833333
## 28 cg02981548 0.4869257 0.4094430 0.5869110 0.2257793 0.4166667
## 93 cg14710850 0.4928898 0.2709836 0.5415595 0.3530923 0.4166667
## 54 cg07152869 0.4640145 0.2123634 0.5392183 0.1710842 0.6666667
## 61 cg08861434 0.4830292 0.2358769 0.4931322 0.1426766 0.6666667
## Average_Importance
## 153 0.5954719
## 10 0.5199691
## 39 0.4963720
## 60 0.4712769
## 129 0.4701707
## 102 0.4663537
## 50 0.4596249
## 1 0.4574110
## 19 0.4467244
## 146 0.4449745
## 107 0.4412126
## 62 0.4316873
## 28 0.4251451
## 93 0.4150384
## 54 0.4106694
## 61 0.4042763
ggplot(Top_Frequency_Feature_importance, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Feature Importance Selected Based on Frequncy Way and Sorted by Average Value",
x = "Feature",
y = "Average Importance") +
theme_minimal()
# This is to check if all elements inside Mutual method is in Mean method, and print out the features that not in Mean method
all(df_process_mutual_FeatureName %in% top_Num_combined_importance_Avg_ordered_Nam)
## [1] TRUE
Mutual_not_in_Mean <- setdiff(df_process_mutual_FeatureName, top_Num_combined_importance_Avg_ordered_Nam)
print(Mutual_not_in_Mean)
## character(0)
Phenotype Part Data frame : “phenoticPart_RAW”
RAW Merged Data frame : “merged_df_raw”
Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”
Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”
Ordered Feature Frequency / Common Data Frame:
“frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency.
“feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
head(phenoticPart_RAW)
#
# save(NUM_COMMON_FEATURES,
# combined_importance_quantiles,
# combined_importance_Avg_ordered,
# frequency_feature_df_RAW_ordered,
# top_Num_median_features_Name,
# top_Num_combined_importance_Avg_ordered_Nam,
# file = "Part2_V8_08_top_features_5KCpGs.RData")
#
# save(processed_data_m3,processed_data_m3_df,AfterProcess_FeatureName_m3,file = "Part2_V8_08_BinaryMerged_5KCpGs.RData")
#
# save(phenoticPart_RAW, merged_df_raw, file = "PhenotypeAndMerged.RData")
The feature selection method :
Number_fea_input <- INPUT_NUMBER_FEATURES
Flag_8mean <- INPUT_Method_Mean_Choose
Flag_8median <- INPUT_Method_Median_Choose
Flag_8Fequency <- INPUT_Method_Frequency_Choose
print(paste("the Top number of features here is set to:", Number_fea_input))
## [1] "the Top number of features here is set to: 250"
Flag_8mean
## [1] TRUE
Flag_8median
## [1] TRUE
Flag_8Fequency
## [1] TRUE
selected_impAvg_ordered <- head(combined_importance_Avg_ordered,n = Number_fea_input)
print(head(selected_impAvg_ordered))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 153 PC1 1.0000000 0.1866056 1.0000000 0.1240874 0.6666667
## 10 cg00962106 0.6281880 0.5279227 0.7275966 0.4661381 0.2500000
## 154 PC2 0.7857178 0.2487008 0.8852268 0.2192495 0.4166667
## 39 cg05096415 0.4453346 0.5867398 0.4148286 0.2849571 0.7500000
## 60 cg08857872 0.3808807 0.4675933 0.5311399 0.5601036 0.4166667
## 129 cg23432430 0.4387490 0.2739420 0.5696065 0.3185561 0.7500000
## Average_Importance
## 153 0.5954719
## 10 0.5199691
## 154 0.5111123
## 39 0.4963720
## 60 0.4712769
## 129 0.4701707
print(dim(selected_impAvg_ordered))
## [1] 155 7
selected_impAvg_ordered_NAME <- selected_impAvg_ordered$Feature
print(head(selected_impAvg_ordered_NAME))
## [1] "PC1" "cg00962106" "PC2" "cg05096415" "cg08857872" "cg23432430"
df_selected_Mean <- processed_dataFrame[,c("DX",selected_impAvg_ordered_NAME)]
print(head(df_selected_Mean))
## DX PC1 cg00962106 PC2 cg05096415 cg08857872
## 200223270003_R02C01 MCI -0.214185447 0.9124898 1.470293e-02 0.9182527 0.3395280
## 200223270003_R03C01 CN -0.172761185 0.5375751 5.745834e-02 0.5177819 0.8181845
## 200223270003_R06C01 CN -0.003667305 0.5040948 8.372861e-02 0.6288426 0.2970779
## 200223270003_R07C01 Dementia -0.186779607 0.9039029 -1.117250e-02 0.6060271 0.2954090
## 200223270006_R01C01 MCI 0.026814649 0.8961556 1.650735e-05 0.5599588 0.8935876
## 200223270006_R04C01 CN -0.037862929 0.8857597 1.571950e-02 0.5441200 0.8901338
## cg23432430 cg16652920 cg06864789 age.now cg01921484 cg26948066 cg17186592
## 200223270003_R02C01 0.9482702 0.9436000 0.05369415 82.40000 0.90985496 0.4685225 0.9230463
## 200223270003_R03C01 0.9455418 0.9431222 0.46053125 78.60000 0.90931369 0.5026045 0.8593448
## 200223270003_R06C01 0.9418716 0.9457161 0.87513655 80.40000 0.92044873 0.9101976 0.8467599
## 200223270003_R07C01 0.9426559 0.9419785 0.49020327 78.16441 0.91674311 0.9379543 0.4986373
## 200223270006_R01C01 0.9461736 0.9529417 0.47852685 62.90000 0.02943747 0.9120181 0.8978999
## 200223270006_R04C01 0.9508404 0.9492648 0.05423587 80.67796 0.89057041 0.8868608 0.9239750
## cg09584650 cg12279734 cg02981548 cg14710850 PC3 cg07152869
## 200223270003_R02C01 0.08230254 0.6435368 0.1342571 0.8048592 -0.014043316 0.8284151
## 200223270003_R03C01 0.09661586 0.1494651 0.5220037 0.8090950 0.005055871 0.5050630
## 200223270003_R06C01 0.52399749 0.8760759 0.5098965 0.8285902 0.029143653 0.8352490
## 200223270003_R07C01 0.11587211 0.8674214 0.5660985 0.8336457 -0.032302430 0.5194300
## 200223270006_R01C01 0.42115185 0.6454450 0.5678714 0.8500725 0.052947950 0.5025709
## 200223270006_R04C01 0.56043178 0.8660058 0.5079859 0.8207247 -0.008685676 0.8080916
## cg08861434 cg15501526 cg25259265 cg02225060 cg24859648 cg11133939
## 200223270003_R02C01 0.8768306 0.6362531 0.4356646 0.6828159 0.83777536 0.1282694
## 200223270003_R03C01 0.4352647 0.6319253 0.8893591 0.8265195 0.44392797 0.5920898
## 200223270003_R06C01 0.8698813 0.7435100 0.4201700 0.5209552 0.03341185 0.5127706
## 200223270003_R07C01 0.4709249 0.7756577 0.4455517 0.8078889 0.43582347 0.8474176
## 200223270006_R01C01 0.8618532 0.3230777 0.8423337 0.6084903 0.03087161 0.8589133
## 200223270006_R04C01 0.9058965 0.8342695 0.8460736 0.7638781 0.02588024 0.5246557
## cg25879395 cg02621446 cg00247094 cg02494911 cg16771215 cg24861747
## 200223270003_R02C01 0.88130864 0.8731313 0.5399349 0.3049435 0.88389723 0.3540897
## 200223270003_R03C01 0.02603438 0.8095534 0.9315640 0.2416332 0.07196933 0.4309505
## 200223270003_R06C01 0.91060615 0.7511582 0.5177874 0.2520909 0.09949974 0.8071462
## 200223270003_R07C01 0.89205942 0.8773609 0.5377765 0.2457032 0.64234023 0.3347317
## 200223270006_R01C01 0.47886249 0.2046541 0.9109309 0.8045030 0.62679274 0.3544795
## 200223270006_R04C01 0.02145248 0.7963817 0.5266535 0.7489283 0.06970175 0.5997840
## cg01153376 cg04412904 cg20913114 cg01128042 cg10240127 cg14564293
## 200223270003_R02C01 0.4872148 0.05088595 0.36510482 0.9113420 0.9250553 0.52089591
## 200223270003_R03C01 0.9639670 0.07717659 0.80382984 0.5328806 0.9403255 0.04000662
## 200223270003_R06C01 0.2242410 0.08253743 0.03158439 0.5222757 0.9056974 0.04959460
## 200223270003_R07C01 0.5155654 0.06217431 0.81256840 0.5141721 0.9396217 0.03114773
## 200223270006_R01C01 0.9588916 0.11888769 0.81502059 0.9321215 0.9262370 0.51703196
## 200223270006_R04C01 0.9586876 0.08885846 0.90468830 0.5050081 0.9240497 0.51535010
## cg16749614 cg01013522 cg16579946 cg03129555 cg02372404 cg05234269
## 200223270003_R02C01 0.8678741 0.6251168 0.6306315 0.6079616 0.03598249 0.93848584
## 200223270003_R03C01 0.8539348 0.8862821 0.6648766 0.5785498 0.02767285 0.57461229
## 200223270003_R06C01 0.5874127 0.5425308 0.6455081 0.9137818 0.03127855 0.02467208
## 200223270003_R07C01 0.5555391 0.8429862 0.8979650 0.9043041 0.55685785 0.56516794
## 200223270006_R01C01 0.8026346 0.0480531 0.6886498 0.9286357 0.02587736 0.94829529
## 200223270006_R04C01 0.7903978 0.8240222 0.6766907 0.9088564 0.02828648 0.56298286
## cg12146221 cg12228670 cg14924512 cg27452255 cg16715186 cg00616572
## 200223270003_R02C01 0.2049284 0.8632174 0.5303907 0.9001010 0.2742789 0.9335067
## 200223270003_R03C01 0.1814927 0.8496212 0.9160885 0.6593379 0.7946153 0.9214079
## 200223270003_R06C01 0.8619250 0.8738949 0.9088414 0.9012217 0.8124316 0.9113633
## 200223270003_R07C01 0.1238469 0.8362189 0.9081681 0.8898635 0.7773263 0.9160238
## 200223270006_R01C01 0.2021598 0.8079694 0.9111789 0.5779792 0.8334531 0.4861334
## 200223270006_R04C01 0.1383786 0.6966666 0.5331753 0.8809143 0.8039945 0.9067928
## cg05570109 cg00154902 cg14293999 cg17421046 cg15775217 cg09854620
## 200223270003_R02C01 0.3466611 0.5137741 0.2836710 0.9026993 0.5707441 0.5220587
## 200223270003_R03C01 0.5866750 0.8540746 0.9172023 0.9112100 0.9168327 0.8739646
## 200223270003_R06C01 0.4046471 0.8188126 0.9168166 0.8952031 0.6042521 0.8973149
## 200223270003_R07C01 0.6014355 0.4625776 0.9188336 0.9268852 0.9062231 0.8958863
## 200223270006_R01C01 0.5774881 0.4690086 0.1971116 0.1118337 0.9083515 0.9075331
## 200223270006_R04C01 0.8756826 0.4547219 0.9030919 0.4174370 0.6383270 0.9318820
## cg19503462 cg26757229 cg06378561 cg01680303 cg06715136 cg15535896
## 200223270003_R02C01 0.7951675 0.6723726 0.9389306 0.5095174 0.3400192 0.3382952
## 200223270003_R03C01 0.4537684 0.1422661 0.9377503 0.1344941 0.9259109 0.9253926
## 200223270003_R06C01 0.6997359 0.7933794 0.5154019 0.7573869 0.9079807 0.3320191
## 200223270003_R07C01 0.7189778 0.8074830 0.9403569 0.4772204 0.6782105 0.9409104
## 200223270006_R01C01 0.7301755 0.5265692 0.4956816 0.1176263 0.8369052 0.9326027
## 200223270006_R04C01 0.4207207 0.7341953 0.9268832 0.5133033 0.8807568 0.9156401
## cg00322003 cg27341708 cg03084184 cg26219488 cg18339359 cg06697310
## 200223270003_R02C01 0.1759911 0.48846610 0.8162981 0.9336638 0.8824858 0.8454609
## 200223270003_R03C01 0.5702070 0.02613847 0.7877128 0.9134707 0.9040272 0.8653044
## 200223270003_R06C01 0.3077122 0.86893582 0.4546397 0.9261878 0.8552121 0.2405168
## 200223270003_R07C01 0.6104341 0.02642300 0.7812413 0.9217866 0.3073106 0.8479193
## 200223270006_R01C01 0.6147419 0.47573455 0.7818230 0.4929692 0.8973742 0.8206613
## 200223270006_R04C01 0.2293759 0.89411974 0.7725853 0.9431574 0.2292800 0.7839595
## cg10369879 cg10738648 cg06536614 cg26069044 cg20685672 cg03327352
## 200223270003_R02C01 0.9218784 0.44931577 0.5824474 0.92401867 0.67121006 0.8851712
## 200223270003_R03C01 0.3149306 0.49894016 0.5746694 0.94072227 0.79320906 0.8786878
## 200223270003_R06C01 0.9141081 0.05552024 0.5773468 0.93321315 0.66136456 0.3042310
## 200223270003_R07C01 0.9054415 0.03730440 0.5848917 0.56567694 0.80838304 0.8273211
## 200223270006_R01C01 0.2917862 0.54952781 0.5669919 0.94369927 0.08291414 0.8774082
## 200223270006_R04C01 0.9200403 0.59358167 0.5718514 0.02040391 0.84460055 0.8829492
## cg00999469 cg23658987 cg05841700 cg01667144 cg15865722 cg13885788
## 200223270003_R02C01 0.3274080 0.79757644 0.2923544 0.8971484 0.89438595 0.9380618
## 200223270003_R03C01 0.2857719 0.07511718 0.9146488 0.3175389 0.90194372 0.9369476
## 200223270003_R06C01 0.2499229 0.10177571 0.3737990 0.9238364 0.92118977 0.5163017
## 200223270003_R07C01 0.2819622 0.46747992 0.5046468 0.8739442 0.09230759 0.9183376
## 200223270006_R01C01 0.2933539 0.76831297 0.8419031 0.2931961 0.93422668 0.5525542
## 200223270006_R04C01 0.2966623 0.08988532 0.9286652 0.8616530 0.92220002 0.9328289
## cg14527649 cg23161429 cg20370184 cg18821122 cg07523188 cg12534577
## 200223270003_R02C01 0.2678912 0.8956965 0.37710950 0.9291309 0.7509183 0.8585231
## 200223270003_R03C01 0.7954683 0.9099619 0.05737964 0.5901603 0.1524386 0.8493466
## 200223270003_R06C01 0.8350610 0.8833895 0.04740505 0.5779620 0.7127592 0.8395241
## 200223270003_R07C01 0.8428684 0.9134709 0.83572095 0.9251431 0.8464983 0.8511384
## 200223270006_R01C01 0.8231348 0.8738558 0.04056608 0.9217018 0.7847738 0.8804655
## 200223270006_R04C01 0.8022444 0.9104210 0.04038589 0.5412250 0.8231277 0.3029013
## cg02356645 cg03982462 cg04248279 cg13080267 cg27639199 cg08198851
## 200223270003_R02C01 0.5105903 0.8562777 0.8534976 0.78936656 0.67515415 0.6578905
## 200223270003_R03C01 0.5833923 0.6023731 0.8458854 0.78371483 0.67552763 0.6578186
## 200223270003_R06C01 0.5701428 0.8778458 0.8332786 0.09436069 0.06233093 0.1272153
## 200223270003_R07C01 0.5683381 0.8860227 0.3303204 0.09351259 0.05701332 0.8351465
## 200223270006_R01C01 0.5233692 0.8703107 0.5966878 0.45173796 0.05037694 0.8791156
## 200223270006_R04C01 0.9188670 0.8792860 0.8939599 0.49866715 0.08144161 0.1423737
## cg11331837 cg24873924 cg20507276 cg25561557 cg22274273 cg12682323
## 200223270003_R02C01 0.03692842 0.3060635 0.12238910 0.76736369 0.4209386 0.9397956
## 200223270003_R03C01 0.57150125 0.8640985 0.38721972 0.03851635 0.4246379 0.9003940
## 200223270003_R06C01 0.03182862 0.8259149 0.47978438 0.47259480 0.4196796 0.9157877
## 200223270003_R07C01 0.03832164 0.8333940 0.02261996 0.43364249 0.4164100 0.9048877
## 200223270006_R01C01 0.93008298 0.8761177 0.37465798 0.46211439 0.7951105 0.1065347
## 200223270006_R04C01 0.54004452 0.8585363 0.03570795 0.44651530 0.0229810 0.8836232
## cg17738613 cg21209485 cg03088219 cg03660162 cg10750306 cg27272246
## 200223270003_R02C01 0.6879612 0.8865053 0.844002862 0.8691767 0.04919915 0.8615873
## 200223270003_R03C01 0.6582258 0.8714878 0.007435243 0.5160770 0.55160081 0.8705287
## 200223270003_R06C01 0.1022257 0.2292550 0.120155222 0.9026304 0.54694332 0.8103777
## 200223270003_R07C01 0.8960156 0.2351526 0.826554308 0.5305691 0.59824543 0.0310881
## 200223270006_R01C01 0.8850702 0.8882046 0.066294915 0.9257451 0.53158639 0.7686536
## 200223270006_R04C01 0.8481916 0.2292483 0.574738383 0.8935772 0.05646838 0.4403542
## cg11438323 cg12738248 cg21854924 cg20139683 cg16178271 cg07028768
## 200223270003_R02C01 0.4863471 0.85430866 0.8729132 0.8717075 0.6445416 0.4496851
## 200223270003_R03C01 0.8984559 0.88010292 0.7162342 0.9059433 0.6178075 0.8536078
## 200223270003_R06C01 0.8722772 0.51121855 0.7520990 0.8962554 0.6641952 0.8356936
## 200223270003_R07C01 0.5026756 0.09131476 0.8641284 0.9218012 0.7148058 0.4245893
## 200223270006_R01C01 0.8809646 0.91529345 0.6498895 0.1708472 0.6138954 0.8835151
## 200223270006_R04C01 0.8717937 0.91911405 0.5943113 0.1067122 0.9414188 0.4514661
## cg26474732 cg00675157 cg23916408 cg05321907 cg17429539 cg06950937
## 200223270003_R02C01 0.7843252 0.9188438 0.1942275 0.2880477 0.7860900 0.8910968
## 200223270003_R03C01 0.8184088 0.9242325 0.9154993 0.1782629 0.7100923 0.2889345
## 200223270003_R06C01 0.7358417 0.9254708 0.8886255 0.8427929 0.7660838 0.9143801
## 200223270003_R07C01 0.7509296 0.5447244 0.8872447 0.8320504 0.6984969 0.8891079
## 200223270006_R01C01 0.8294938 0.5173554 0.2219945 0.2422218 0.6508597 0.8868617
## 200223270006_R04C01 0.8033167 0.9247232 0.1520624 0.2429551 0.2828452 0.9093273
## cg14240646 cg27086157 cg25758034 cg11247378 cg19377607 cg07480176
## 200223270003_R02C01 0.5391334 0.9224112 0.6114028 0.1591185 0.05377464 0.5171664
## 200223270003_R03C01 0.2538363 0.9219304 0.6649219 0.7874849 0.90570746 0.3760452
## 200223270003_R06C01 0.1864902 0.3224986 0.2393844 0.4807942 0.06636174 0.6998389
## 200223270003_R07C01 0.6402007 0.3455486 0.7071501 0.4537348 0.68788639 0.2189042
## 200223270006_R01C01 0.7696079 0.8988962 0.2301078 0.1537079 0.06338988 0.5570021
## 200223270006_R04C01 0.1490028 0.9159217 0.6891513 0.1686356 0.91551446 0.4501196
## cg27577781 cg11187460 cg03071582 cg12284872 cg02932958 cg12012426
## 200223270003_R02C01 0.8143535 0.03672179 0.9187811 0.8008333 0.7901008 0.9165048
## 200223270003_R03C01 0.8113185 0.92516409 0.5844421 0.7414569 0.4210489 0.9434768
## 200223270003_R06C01 0.8144274 0.03109553 0.6245558 0.7725267 0.3825995 0.9220044
## 200223270003_R07C01 0.7970617 0.53283119 0.9283683 0.7573369 0.7617081 0.9241284
## 200223270006_R01C01 0.8640044 0.54038146 0.5715416 0.7201607 0.8431126 0.9327143
## 200223270006_R04C01 0.8840237 0.91096169 0.6534650 0.8021446 0.7610084 0.9271167
## cg06118351 cg00696044 cg25436480 cg02320265 cg11227702 cg18819889
## 200223270003_R02C01 0.36339400 0.55608424 0.84251599 0.8853213 0.86486075 0.9156157
## 200223270003_R03C01 0.47148604 0.07552381 0.49940321 0.4686314 0.49184121 0.9004455
## 200223270003_R06C01 0.86559618 0.79270858 0.34943119 0.4838749 0.02543724 0.9054439
## 200223270003_R07C01 0.83494303 0.03548419 0.85244913 0.8986848 0.45150971 0.9089935
## 200223270006_R01C01 0.02632111 0.10714386 0.44545117 0.8987560 0.89086877 0.9065397
## 200223270006_R04C01 0.83329300 0.18420803 0.02575036 0.4768520 0.87675947 0.9242767
## cg06112204 cg19512141 cg24506579 cg00272795 cg21697769 cg12776173
## 200223270003_R02C01 0.5251592 0.8209161 0.5244337 0.46365138 0.8946108 0.10388038
## 200223270003_R03C01 0.8773488 0.7903543 0.5794845 0.82839260 0.2822953 0.87306345
## 200223270003_R06C01 0.8867975 0.8404684 0.9427785 0.07231279 0.8698740 0.70094907
## 200223270003_R07C01 0.5613799 0.2202759 0.9323844 0.78303831 0.9134887 0.11367159
## 200223270006_R01C01 0.9184122 0.8059589 0.9185355 0.78219952 0.2683820 0.09458405
## 200223270006_R04C01 0.9152514 0.7020247 0.4332642 0.44408249 0.2765740 0.86532175
## cg07138269 cg17906851 cg08779649 cg10985055 cg08584917 cg04664583
## 200223270003_R02C01 0.5002290 0.9488392 0.44449401 0.8518169 0.5663205 0.5572814
## 200223270003_R03C01 0.9426707 0.9529718 0.45076825 0.8631895 0.9019732 0.5881190
## 200223270003_R06C01 0.5057781 0.6462151 0.04810217 0.5456633 0.9187789 0.9352717
## 200223270003_R07C01 0.9400527 0.9553497 0.42715969 0.8825100 0.6007449 0.9350230
## 200223270006_R01C01 0.9321602 0.6222117 0.89313476 0.8841690 0.9069098 0.9424588
## 200223270006_R04C01 0.9333501 0.6441202 0.59523771 0.8407797 0.9263584 0.9379537
## cg01933473 cg00689685 cg14307563 cg12784167 cg24851651 cg15633912
## 200223270003_R02C01 0.2589014 0.7019389 0.1855966 0.81503498 0.03674702 0.1605530
## 200223270003_R03C01 0.6726133 0.8634268 0.8916957 0.02811410 0.05358297 0.9333421
## 200223270003_R06C01 0.2642560 0.6378795 0.8750052 0.03073269 0.05968923 0.8737362
## 200223270003_R07C01 0.1978068 0.8624541 0.8975663 0.84775699 0.60864179 0.9137334
## 200223270006_R01C01 0.7599441 0.6361891 0.8762842 0.83825789 0.08825834 0.9169706
## 200223270006_R04C01 0.7405661 0.6356260 0.9168614 0.45475291 0.91932068 0.8890004
## cg12466610 cg16788319 cg20678988 cg01413796 cg01549082
## 200223270003_R02C01 0.05767659 0.9379870 0.8438718 0.1345128 0.2924138
## 200223270003_R03C01 0.59131778 0.8913429 0.8548886 0.2830672 0.7065693
## 200223270003_R06C01 0.06939623 0.8680680 0.7786685 0.8194681 0.2895440
## 200223270003_R07C01 0.04527733 0.8811444 0.8260541 0.9007710 0.6422955
## 200223270006_R01C01 0.05212904 0.3123481 0.3295384 0.2603027 0.8471236
## 200223270006_R04C01 0.05104033 0.2995627 0.8541667 0.9207672 0.6949888
dim(df_selected_Mean)
## [1] 648 156
print(selected_impAvg_ordered_NAME)
## [1] "PC1" "cg00962106" "PC2" "cg05096415" "cg08857872" "cg23432430" "cg16652920"
## [8] "cg06864789" "age.now" "cg01921484" "cg26948066" "cg17186592" "cg09584650" "cg12279734"
## [15] "cg02981548" "cg14710850" "PC3" "cg07152869" "cg08861434" "cg15501526" "cg25259265"
## [22] "cg02225060" "cg24859648" "cg11133939" "cg25879395" "cg02621446" "cg00247094" "cg02494911"
## [29] "cg16771215" "cg24861747" "cg01153376" "cg04412904" "cg20913114" "cg01128042" "cg10240127"
## [36] "cg14564293" "cg16749614" "cg01013522" "cg16579946" "cg03129555" "cg02372404" "cg05234269"
## [43] "cg12146221" "cg12228670" "cg14924512" "cg27452255" "cg16715186" "cg00616572" "cg05570109"
## [50] "cg00154902" "cg14293999" "cg17421046" "cg15775217" "cg09854620" "cg19503462" "cg26757229"
## [57] "cg06378561" "cg01680303" "cg06715136" "cg15535896" "cg00322003" "cg27341708" "cg03084184"
## [64] "cg26219488" "cg18339359" "cg06697310" "cg10369879" "cg10738648" "cg06536614" "cg26069044"
## [71] "cg20685672" "cg03327352" "cg00999469" "cg23658987" "cg05841700" "cg01667144" "cg15865722"
## [78] "cg13885788" "cg14527649" "cg23161429" "cg20370184" "cg18821122" "cg07523188" "cg12534577"
## [85] "cg02356645" "cg03982462" "cg04248279" "cg13080267" "cg27639199" "cg08198851" "cg11331837"
## [92] "cg24873924" "cg20507276" "cg25561557" "cg22274273" "cg12682323" "cg17738613" "cg21209485"
## [99] "cg03088219" "cg03660162" "cg10750306" "cg27272246" "cg11438323" "cg12738248" "cg21854924"
## [106] "cg20139683" "cg16178271" "cg07028768" "cg26474732" "cg00675157" "cg23916408" "cg05321907"
## [113] "cg17429539" "cg06950937" "cg14240646" "cg27086157" "cg25758034" "cg11247378" "cg19377607"
## [120] "cg07480176" "cg27577781" "cg11187460" "cg03071582" "cg12284872" "cg02932958" "cg12012426"
## [127] "cg06118351" "cg00696044" "cg25436480" "cg02320265" "cg11227702" "cg18819889" "cg06112204"
## [134] "cg19512141" "cg24506579" "cg00272795" "cg21697769" "cg12776173" "cg07138269" "cg17906851"
## [141] "cg08779649" "cg10985055" "cg08584917" "cg04664583" "cg01933473" "cg00689685" "cg14307563"
## [148] "cg12784167" "cg24851651" "cg15633912" "cg12466610" "cg16788319" "cg20678988" "cg01413796"
## [155] "cg01549082"
output_mean_process<-processed_data[,c("DX",selected_impAvg_ordered_NAME)]
print(head(output_mean_process))
## # A tibble: 6 × 156
## DX PC1 cg00962106 PC2 cg05096415 cg08857872 cg23432430 cg16652920 cg06864789
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI -0.214 0.912 0.0147 0.918 0.340 0.948 0.944 0.0537
## 2 CN -0.173 0.538 0.0575 0.518 0.818 0.946 0.943 0.461
## 3 CN -0.00367 0.504 0.0837 0.629 0.297 0.942 0.946 0.875
## 4 Dementia -0.187 0.904 -0.0112 0.606 0.295 0.943 0.942 0.490
## 5 MCI 0.0268 0.896 0.0000165 0.560 0.894 0.946 0.953 0.479
## 6 CN -0.0379 0.886 0.0157 0.544 0.890 0.951 0.949 0.0542
## # ℹ 147 more variables: age.now <dbl>, cg01921484 <dbl>, cg26948066 <dbl>, cg17186592 <dbl>,
## # cg09584650 <dbl>, cg12279734 <dbl>, cg02981548 <dbl>, cg14710850 <dbl>, PC3 <dbl>,
## # cg07152869 <dbl>, cg08861434 <dbl>, cg15501526 <dbl>, cg25259265 <dbl>, cg02225060 <dbl>,
## # cg24859648 <dbl>, cg11133939 <dbl>, cg25879395 <dbl>, cg02621446 <dbl>, cg00247094 <dbl>,
## # cg02494911 <dbl>, cg16771215 <dbl>, cg24861747 <dbl>, cg01153376 <dbl>, cg04412904 <dbl>,
## # cg20913114 <dbl>, cg01128042 <dbl>, cg10240127 <dbl>, cg14564293 <dbl>, cg16749614 <dbl>,
## # cg01013522 <dbl>, cg16579946 <dbl>, cg03129555 <dbl>, cg02372404 <dbl>, cg05234269 <dbl>, …
dim(output_mean_process)
## [1] 648 156
Selected_median_imp <- head(combined_importance_quantiles,n = Number_fea_input)
print(head(Selected_median_imp))
## Feature 0% 25% 50% 75% 100%
## 153 PC1 0.1240874 0.1866056 0.6666667 1.0000000 1.0000000
## 10 cg00962106 0.2500000 0.4661381 0.5279227 0.6281880 0.7275966
## 102 cg16652920 0.3449586 0.4472525 0.5000000 0.5184487 0.5211088
## 155 PC3 0.1650015 0.2153460 0.5000000 0.5045821 0.6786379
## 150 cg27452255 0.0000000 0.1809858 0.4871655 0.4910563 0.5000000
## 61 cg08861434 0.1426766 0.2358769 0.4830292 0.4931322 0.6666667
Selected_median_imp_Name<-Selected_median_imp$Feature
print(head(Selected_median_imp_Name))
## [1] "PC1" "cg00962106" "cg16652920" "PC3" "cg27452255" "cg08861434"
df_selected_Median <- processed_dataFrame[,c("DX",Selected_median_imp_Name)]
output_median_feature<-processed_data[,c("DX",Selected_median_imp_Name)]
print(head(df_selected_Median))
## DX PC1 cg00962106 cg16652920 PC3 cg27452255
## 200223270003_R02C01 MCI -0.214185447 0.9124898 0.9436000 -0.014043316 0.9001010
## 200223270003_R03C01 CN -0.172761185 0.5375751 0.9431222 0.005055871 0.6593379
## 200223270003_R06C01 CN -0.003667305 0.5040948 0.9457161 0.029143653 0.9012217
## 200223270003_R07C01 Dementia -0.186779607 0.9039029 0.9419785 -0.032302430 0.8898635
## 200223270006_R01C01 MCI 0.026814649 0.8961556 0.9529417 0.052947950 0.5779792
## 200223270006_R04C01 CN -0.037862929 0.8857597 0.9492648 -0.008685676 0.8809143
## cg08861434 cg06864789 cg08857872 cg07152869 cg09584650 cg16749614 age.now
## 200223270003_R02C01 0.8768306 0.05369415 0.3395280 0.8284151 0.08230254 0.8678741 82.40000
## 200223270003_R03C01 0.4352647 0.46053125 0.8181845 0.5050630 0.09661586 0.8539348 78.60000
## 200223270003_R06C01 0.8698813 0.87513655 0.2970779 0.8352490 0.52399749 0.5874127 80.40000
## 200223270003_R07C01 0.4709249 0.49020327 0.2954090 0.5194300 0.11587211 0.5555391 78.16441
## 200223270006_R01C01 0.8618532 0.47852685 0.8935876 0.5025709 0.42115185 0.8026346 62.90000
## 200223270006_R04C01 0.9058965 0.05423587 0.8901338 0.8080916 0.56043178 0.7903978 80.67796
## cg05096415 cg23432430 cg01921484 cg02225060 cg02981548 cg14710850
## 200223270003_R02C01 0.9182527 0.9482702 0.90985496 0.6828159 0.1342571 0.8048592
## 200223270003_R03C01 0.5177819 0.9455418 0.90931369 0.8265195 0.5220037 0.8090950
## 200223270003_R06C01 0.6288426 0.9418716 0.92044873 0.5209552 0.5098965 0.8285902
## 200223270003_R07C01 0.6060271 0.9426559 0.91674311 0.8078889 0.5660985 0.8336457
## 200223270006_R01C01 0.5599588 0.9461736 0.02943747 0.6084903 0.5678714 0.8500725
## 200223270006_R04C01 0.5441200 0.9508404 0.89057041 0.7638781 0.5079859 0.8207247
## cg19503462 PC2 cg17186592 cg00247094 cg11133939 cg25259265
## 200223270003_R02C01 0.7951675 1.470293e-02 0.9230463 0.5399349 0.1282694 0.4356646
## 200223270003_R03C01 0.4537684 5.745834e-02 0.8593448 0.9315640 0.5920898 0.8893591
## 200223270003_R06C01 0.6997359 8.372861e-02 0.8467599 0.5177874 0.5127706 0.4201700
## 200223270003_R07C01 0.7189778 -1.117250e-02 0.4986373 0.5377765 0.8474176 0.4455517
## 200223270006_R01C01 0.7301755 1.650735e-05 0.8978999 0.9109309 0.8589133 0.8423337
## 200223270006_R04C01 0.4207207 1.571950e-02 0.9239750 0.5266535 0.5246557 0.8460736
## cg16715186 cg05570109 cg26948066 cg02494911 cg14293999 cg14924512
## 200223270003_R02C01 0.2742789 0.3466611 0.4685225 0.3049435 0.2836710 0.5303907
## 200223270003_R03C01 0.7946153 0.5866750 0.5026045 0.2416332 0.9172023 0.9160885
## 200223270003_R06C01 0.8124316 0.4046471 0.9101976 0.2520909 0.9168166 0.9088414
## 200223270003_R07C01 0.7773263 0.6014355 0.9379543 0.2457032 0.9188336 0.9081681
## 200223270006_R01C01 0.8334531 0.5774881 0.9120181 0.8045030 0.1971116 0.9111789
## 200223270006_R04C01 0.8039945 0.8756826 0.8868608 0.7489283 0.9030919 0.5331753
## cg02621446 cg03129555 cg04412904 cg26219488 cg00154902 cg20913114
## 200223270003_R02C01 0.8731313 0.6079616 0.05088595 0.9336638 0.5137741 0.36510482
## 200223270003_R03C01 0.8095534 0.5785498 0.07717659 0.9134707 0.8540746 0.80382984
## 200223270003_R06C01 0.7511582 0.9137818 0.08253743 0.9261878 0.8188126 0.03158439
## 200223270003_R07C01 0.8773609 0.9043041 0.06217431 0.9217866 0.4625776 0.81256840
## 200223270006_R01C01 0.2046541 0.9286357 0.11888769 0.4929692 0.4690086 0.81502059
## 200223270006_R04C01 0.7963817 0.9088564 0.08885846 0.9431574 0.4547219 0.90468830
## cg03084184 cg12279734 cg01153376 cg16771215 cg04248279 cg06536614
## 200223270003_R02C01 0.8162981 0.6435368 0.4872148 0.88389723 0.8534976 0.5824474
## 200223270003_R03C01 0.7877128 0.1494651 0.9639670 0.07196933 0.8458854 0.5746694
## 200223270003_R06C01 0.4546397 0.8760759 0.2242410 0.09949974 0.8332786 0.5773468
## 200223270003_R07C01 0.7812413 0.8674214 0.5155654 0.64234023 0.3303204 0.5848917
## 200223270006_R01C01 0.7818230 0.6454450 0.9588916 0.62679274 0.5966878 0.5669919
## 200223270006_R04C01 0.7725853 0.8660058 0.9586876 0.06970175 0.8939599 0.5718514
## cg09854620 cg06378561 cg24859648 cg10240127 cg12228670 cg03327352
## 200223270003_R02C01 0.5220587 0.9389306 0.83777536 0.9250553 0.8632174 0.8851712
## 200223270003_R03C01 0.8739646 0.9377503 0.44392797 0.9403255 0.8496212 0.8786878
## 200223270003_R06C01 0.8973149 0.5154019 0.03341185 0.9056974 0.8738949 0.3042310
## 200223270003_R07C01 0.8958863 0.9403569 0.43582347 0.9396217 0.8362189 0.8273211
## 200223270006_R01C01 0.9075331 0.4956816 0.03087161 0.9262370 0.8079694 0.8774082
## 200223270006_R04C01 0.9318820 0.9268832 0.02588024 0.9240497 0.6966666 0.8829492
## cg12146221 cg03982462 cg05841700 cg15865722 cg07523188 cg11227702
## 200223270003_R02C01 0.2049284 0.8562777 0.2923544 0.89438595 0.7509183 0.86486075
## 200223270003_R03C01 0.1814927 0.6023731 0.9146488 0.90194372 0.1524386 0.49184121
## 200223270003_R06C01 0.8619250 0.8778458 0.3737990 0.92118977 0.7127592 0.02543724
## 200223270003_R07C01 0.1238469 0.8860227 0.5046468 0.09230759 0.8464983 0.45150971
## 200223270006_R01C01 0.2021598 0.8703107 0.8419031 0.93422668 0.7847738 0.89086877
## 200223270006_R04C01 0.1383786 0.8792860 0.9286652 0.92220002 0.8231277 0.87675947
## cg10369879 cg16579946 cg24861747 cg14564293 cg01128042 cg00616572
## 200223270003_R02C01 0.9218784 0.6306315 0.3540897 0.52089591 0.9113420 0.9335067
## 200223270003_R03C01 0.3149306 0.6648766 0.4309505 0.04000662 0.5328806 0.9214079
## 200223270003_R06C01 0.9141081 0.6455081 0.8071462 0.04959460 0.5222757 0.9113633
## 200223270003_R07C01 0.9054415 0.8979650 0.3347317 0.03114773 0.5141721 0.9160238
## 200223270006_R01C01 0.2917862 0.6886498 0.3544795 0.51703196 0.9321215 0.4861334
## 200223270006_R04C01 0.9200403 0.6766907 0.5997840 0.51535010 0.5050081 0.9067928
## cg08198851 cg17421046 cg15535896 cg18339359 cg00322003 cg02372404
## 200223270003_R02C01 0.6578905 0.9026993 0.3382952 0.8824858 0.1759911 0.03598249
## 200223270003_R03C01 0.6578186 0.9112100 0.9253926 0.9040272 0.5702070 0.02767285
## 200223270003_R06C01 0.1272153 0.8952031 0.3320191 0.8552121 0.3077122 0.03127855
## 200223270003_R07C01 0.8351465 0.9268852 0.9409104 0.3073106 0.6104341 0.55685785
## 200223270006_R01C01 0.8791156 0.1118337 0.9326027 0.8973742 0.6147419 0.02587736
## 200223270006_R04C01 0.1423737 0.4174370 0.9156401 0.2292800 0.2293759 0.02828648
## cg11331837 cg23658987 cg10738648 cg25561557 cg01667144 cg05234269
## 200223270003_R02C01 0.03692842 0.79757644 0.44931577 0.76736369 0.8971484 0.93848584
## 200223270003_R03C01 0.57150125 0.07511718 0.49894016 0.03851635 0.3175389 0.57461229
## 200223270003_R06C01 0.03182862 0.10177571 0.05552024 0.47259480 0.9238364 0.02467208
## 200223270003_R07C01 0.03832164 0.46747992 0.03730440 0.43364249 0.8739442 0.56516794
## 200223270006_R01C01 0.93008298 0.76831297 0.54952781 0.46211439 0.2931961 0.94829529
## 200223270006_R04C01 0.54004452 0.08988532 0.59358167 0.44651530 0.8616530 0.56298286
## cg12534577 cg06118351 cg13885788 cg10750306 cg15775217 cg01013522
## 200223270003_R02C01 0.8585231 0.36339400 0.9380618 0.04919915 0.5707441 0.6251168
## 200223270003_R03C01 0.8493466 0.47148604 0.9369476 0.55160081 0.9168327 0.8862821
## 200223270003_R06C01 0.8395241 0.86559618 0.5163017 0.54694332 0.6042521 0.5425308
## 200223270003_R07C01 0.8511384 0.83494303 0.9183376 0.59824543 0.9062231 0.8429862
## 200223270006_R01C01 0.8804655 0.02632111 0.5525542 0.53158639 0.9083515 0.0480531
## 200223270006_R04C01 0.3029013 0.83329300 0.9328289 0.05646838 0.6383270 0.8240222
## cg26474732 cg27086157 cg03088219 cg15501526 cg27577781 cg11438323
## 200223270003_R02C01 0.7843252 0.9224112 0.844002862 0.6362531 0.8143535 0.4863471
## 200223270003_R03C01 0.8184088 0.9219304 0.007435243 0.6319253 0.8113185 0.8984559
## 200223270003_R06C01 0.7358417 0.3224986 0.120155222 0.7435100 0.8144274 0.8722772
## 200223270003_R07C01 0.7509296 0.3455486 0.826554308 0.7756577 0.7970617 0.5026756
## 200223270006_R01C01 0.8294938 0.8988962 0.066294915 0.3230777 0.8640044 0.8809646
## 200223270006_R04C01 0.8033167 0.9159217 0.574738383 0.8342695 0.8840237 0.8717937
## cg06715136 cg17738613 cg01680303 cg06697310 cg22274273 cg12738248
## 200223270003_R02C01 0.3400192 0.6879612 0.5095174 0.8454609 0.4209386 0.85430866
## 200223270003_R03C01 0.9259109 0.6582258 0.1344941 0.8653044 0.4246379 0.88010292
## 200223270003_R06C01 0.9079807 0.1022257 0.7573869 0.2405168 0.4196796 0.51121855
## 200223270003_R07C01 0.6782105 0.8960156 0.4772204 0.8479193 0.4164100 0.09131476
## 200223270006_R01C01 0.8369052 0.8850702 0.1176263 0.8206613 0.7951105 0.91529345
## 200223270006_R04C01 0.8807568 0.8481916 0.5133033 0.7839595 0.0229810 0.91911405
## cg21854924 cg14240646 cg03071582 cg24873924 cg17429539 cg06950937
## 200223270003_R02C01 0.8729132 0.5391334 0.9187811 0.3060635 0.7860900 0.8910968
## 200223270003_R03C01 0.7162342 0.2538363 0.5844421 0.8640985 0.7100923 0.2889345
## 200223270003_R06C01 0.7520990 0.1864902 0.6245558 0.8259149 0.7660838 0.9143801
## 200223270003_R07C01 0.8641284 0.6402007 0.9283683 0.8333940 0.6984969 0.8891079
## 200223270006_R01C01 0.6498895 0.7696079 0.5715416 0.8761177 0.6508597 0.8868617
## 200223270006_R04C01 0.5943113 0.1490028 0.6534650 0.8585363 0.2828452 0.9093273
## cg13080267 cg27272246 cg27341708 cg18821122 cg12682323 cg12012426
## 200223270003_R02C01 0.78936656 0.8615873 0.48846610 0.9291309 0.9397956 0.9165048
## 200223270003_R03C01 0.78371483 0.8705287 0.02613847 0.5901603 0.9003940 0.9434768
## 200223270003_R06C01 0.09436069 0.8103777 0.86893582 0.5779620 0.9157877 0.9220044
## 200223270003_R07C01 0.09351259 0.0310881 0.02642300 0.9251431 0.9048877 0.9241284
## 200223270006_R01C01 0.45173796 0.7686536 0.47573455 0.9217018 0.1065347 0.9327143
## 200223270006_R04C01 0.49866715 0.4403542 0.89411974 0.5412250 0.8836232 0.9271167
## cg05321907 cg20139683 cg20685672 cg26757229 cg25436480 cg23916408
## 200223270003_R02C01 0.2880477 0.8717075 0.67121006 0.6723726 0.84251599 0.1942275
## 200223270003_R03C01 0.1782629 0.9059433 0.79320906 0.1422661 0.49940321 0.9154993
## 200223270003_R06C01 0.8427929 0.8962554 0.66136456 0.7933794 0.34943119 0.8886255
## 200223270003_R07C01 0.8320504 0.9218012 0.80838304 0.8074830 0.85244913 0.8872447
## 200223270006_R01C01 0.2422218 0.1708472 0.08291414 0.5265692 0.44545117 0.2219945
## 200223270006_R04C01 0.2429551 0.1067122 0.84460055 0.7341953 0.02575036 0.1520624
## cg20507276 cg02356645 cg07028768 cg00272795 cg25758034 cg16178271
## 200223270003_R02C01 0.12238910 0.5105903 0.4496851 0.46365138 0.6114028 0.6445416
## 200223270003_R03C01 0.38721972 0.5833923 0.8536078 0.82839260 0.6649219 0.6178075
## 200223270003_R06C01 0.47978438 0.5701428 0.8356936 0.07231279 0.2393844 0.6641952
## 200223270003_R07C01 0.02261996 0.5683381 0.4245893 0.78303831 0.7071501 0.7148058
## 200223270006_R01C01 0.37465798 0.5233692 0.8835151 0.78219952 0.2301078 0.6138954
## 200223270006_R04C01 0.03570795 0.9188670 0.4514661 0.44408249 0.6891513 0.9414188
## cg27639199 cg11187460 cg21209485 cg14527649 cg23161429 cg19512141
## 200223270003_R02C01 0.67515415 0.03672179 0.8865053 0.2678912 0.8956965 0.8209161
## 200223270003_R03C01 0.67552763 0.92516409 0.8714878 0.7954683 0.9099619 0.7903543
## 200223270003_R06C01 0.06233093 0.03109553 0.2292550 0.8350610 0.8833895 0.8404684
## 200223270003_R07C01 0.05701332 0.53283119 0.2351526 0.8428684 0.9134709 0.2202759
## 200223270006_R01C01 0.05037694 0.54038146 0.8882046 0.8231348 0.8738558 0.8059589
## 200223270006_R04C01 0.08144161 0.91096169 0.2292483 0.8022444 0.9104210 0.7020247
## cg02320265 cg20370184 cg12284872 cg04664583 cg11247378 cg26069044
## 200223270003_R02C01 0.8853213 0.37710950 0.8008333 0.5572814 0.1591185 0.92401867
## 200223270003_R03C01 0.4686314 0.05737964 0.7414569 0.5881190 0.7874849 0.94072227
## 200223270003_R06C01 0.4838749 0.04740505 0.7725267 0.9352717 0.4807942 0.93321315
## 200223270003_R07C01 0.8986848 0.83572095 0.7573369 0.9350230 0.4537348 0.56567694
## 200223270006_R01C01 0.8987560 0.04056608 0.7201607 0.9424588 0.1537079 0.94369927
## 200223270006_R04C01 0.4768520 0.04038589 0.8021446 0.9379537 0.1686356 0.02040391
## cg25879395 cg00999469 cg06112204 cg02932958 cg19377607 cg12784167
## 200223270003_R02C01 0.88130864 0.3274080 0.5251592 0.7901008 0.05377464 0.81503498
## 200223270003_R03C01 0.02603438 0.2857719 0.8773488 0.4210489 0.90570746 0.02811410
## 200223270003_R06C01 0.91060615 0.2499229 0.8867975 0.3825995 0.06636174 0.03073269
## 200223270003_R07C01 0.89205942 0.2819622 0.5613799 0.7617081 0.68788639 0.84775699
## 200223270006_R01C01 0.47886249 0.2933539 0.9184122 0.8431126 0.06338988 0.83825789
## 200223270006_R04C01 0.02145248 0.2966623 0.9152514 0.7610084 0.91551446 0.45475291
## cg07480176 cg00696044 cg18819889 cg00689685 cg00675157 cg03660162
## 200223270003_R02C01 0.5171664 0.55608424 0.9156157 0.7019389 0.9188438 0.8691767
## 200223270003_R03C01 0.3760452 0.07552381 0.9004455 0.8634268 0.9242325 0.5160770
## 200223270003_R06C01 0.6998389 0.79270858 0.9054439 0.6378795 0.9254708 0.9026304
## 200223270003_R07C01 0.2189042 0.03548419 0.9089935 0.8624541 0.5447244 0.5305691
## 200223270006_R01C01 0.5570021 0.10714386 0.9065397 0.6361891 0.5173554 0.9257451
## 200223270006_R04C01 0.4501196 0.18420803 0.9242767 0.6356260 0.9247232 0.8935772
## cg10985055 cg07138269 cg21697769 cg08779649 cg01933473 cg17906851
## 200223270003_R02C01 0.8518169 0.5002290 0.8946108 0.44449401 0.2589014 0.9488392
## 200223270003_R03C01 0.8631895 0.9426707 0.2822953 0.45076825 0.6726133 0.9529718
## 200223270003_R06C01 0.5456633 0.5057781 0.8698740 0.04810217 0.2642560 0.6462151
## 200223270003_R07C01 0.8825100 0.9400527 0.9134887 0.42715969 0.1978068 0.9553497
## 200223270006_R01C01 0.8841690 0.9321602 0.2683820 0.89313476 0.7599441 0.6222117
## 200223270006_R04C01 0.8407797 0.9333501 0.2765740 0.59523771 0.7405661 0.6441202
## cg14307563 cg12776173 cg24851651 cg08584917 cg16788319 cg24506579
## 200223270003_R02C01 0.1855966 0.10388038 0.03674702 0.5663205 0.9379870 0.5244337
## 200223270003_R03C01 0.8916957 0.87306345 0.05358297 0.9019732 0.8913429 0.5794845
## 200223270003_R06C01 0.8750052 0.70094907 0.05968923 0.9187789 0.8680680 0.9427785
## 200223270003_R07C01 0.8975663 0.11367159 0.60864179 0.6007449 0.8811444 0.9323844
## 200223270006_R01C01 0.8762842 0.09458405 0.08825834 0.9069098 0.3123481 0.9185355
## 200223270006_R04C01 0.9168614 0.86532175 0.91932068 0.9263584 0.2995627 0.4332642
## cg01549082 cg12466610 cg15633912 cg01413796 cg20678988
## 200223270003_R02C01 0.2924138 0.05767659 0.1605530 0.1345128 0.8438718
## 200223270003_R03C01 0.7065693 0.59131778 0.9333421 0.2830672 0.8548886
## 200223270003_R06C01 0.2895440 0.06939623 0.8737362 0.8194681 0.7786685
## 200223270003_R07C01 0.6422955 0.04527733 0.9137334 0.9007710 0.8260541
## 200223270006_R01C01 0.8471236 0.05212904 0.9169706 0.2603027 0.3295384
## 200223270006_R04C01 0.6949888 0.05104033 0.8890004 0.9207672 0.8541667
dim(df_selected_Median)
## [1] 648 156
print(Selected_median_imp_Name)
## [1] "PC1" "cg00962106" "cg16652920" "PC3" "cg27452255" "cg08861434" "cg06864789"
## [8] "cg08857872" "cg07152869" "cg09584650" "cg16749614" "age.now" "cg05096415" "cg23432430"
## [15] "cg01921484" "cg02225060" "cg02981548" "cg14710850" "cg19503462" "PC2" "cg17186592"
## [22] "cg00247094" "cg11133939" "cg25259265" "cg16715186" "cg05570109" "cg26948066" "cg02494911"
## [29] "cg14293999" "cg14924512" "cg02621446" "cg03129555" "cg04412904" "cg26219488" "cg00154902"
## [36] "cg20913114" "cg03084184" "cg12279734" "cg01153376" "cg16771215" "cg04248279" "cg06536614"
## [43] "cg09854620" "cg06378561" "cg24859648" "cg10240127" "cg12228670" "cg03327352" "cg12146221"
## [50] "cg03982462" "cg05841700" "cg15865722" "cg07523188" "cg11227702" "cg10369879" "cg16579946"
## [57] "cg24861747" "cg14564293" "cg01128042" "cg00616572" "cg08198851" "cg17421046" "cg15535896"
## [64] "cg18339359" "cg00322003" "cg02372404" "cg11331837" "cg23658987" "cg10738648" "cg25561557"
## [71] "cg01667144" "cg05234269" "cg12534577" "cg06118351" "cg13885788" "cg10750306" "cg15775217"
## [78] "cg01013522" "cg26474732" "cg27086157" "cg03088219" "cg15501526" "cg27577781" "cg11438323"
## [85] "cg06715136" "cg17738613" "cg01680303" "cg06697310" "cg22274273" "cg12738248" "cg21854924"
## [92] "cg14240646" "cg03071582" "cg24873924" "cg17429539" "cg06950937" "cg13080267" "cg27272246"
## [99] "cg27341708" "cg18821122" "cg12682323" "cg12012426" "cg05321907" "cg20139683" "cg20685672"
## [106] "cg26757229" "cg25436480" "cg23916408" "cg20507276" "cg02356645" "cg07028768" "cg00272795"
## [113] "cg25758034" "cg16178271" "cg27639199" "cg11187460" "cg21209485" "cg14527649" "cg23161429"
## [120] "cg19512141" "cg02320265" "cg20370184" "cg12284872" "cg04664583" "cg11247378" "cg26069044"
## [127] "cg25879395" "cg00999469" "cg06112204" "cg02932958" "cg19377607" "cg12784167" "cg07480176"
## [134] "cg00696044" "cg18819889" "cg00689685" "cg00675157" "cg03660162" "cg10985055" "cg07138269"
## [141] "cg21697769" "cg08779649" "cg01933473" "cg17906851" "cg14307563" "cg12776173" "cg24851651"
## [148] "cg08584917" "cg16788319" "cg24506579" "cg01549082" "cg12466610" "cg15633912" "cg01413796"
## [155] "cg20678988"
print(head(output_median_feature))
## # A tibble: 6 × 156
## DX PC1 cg00962106 cg16652920 PC3 cg27452255 cg08861434 cg06864789 cg08857872
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI -0.214 0.912 0.944 -0.0140 0.900 0.877 0.0537 0.340
## 2 CN -0.173 0.538 0.943 0.00506 0.659 0.435 0.461 0.818
## 3 CN -0.00367 0.504 0.946 0.0291 0.901 0.870 0.875 0.297
## 4 Dementia -0.187 0.904 0.942 -0.0323 0.890 0.471 0.490 0.295
## 5 MCI 0.0268 0.896 0.953 0.0529 0.578 0.862 0.479 0.894
## 6 CN -0.0379 0.886 0.949 -0.00869 0.881 0.906 0.0542 0.890
## # ℹ 147 more variables: cg07152869 <dbl>, cg09584650 <dbl>, cg16749614 <dbl>, age.now <dbl>,
## # cg05096415 <dbl>, cg23432430 <dbl>, cg01921484 <dbl>, cg02225060 <dbl>, cg02981548 <dbl>,
## # cg14710850 <dbl>, cg19503462 <dbl>, PC2 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>,
## # cg11133939 <dbl>, cg25259265 <dbl>, cg16715186 <dbl>, cg05570109 <dbl>, cg26948066 <dbl>,
## # cg02494911 <dbl>, cg14293999 <dbl>, cg14924512 <dbl>, cg02621446 <dbl>, cg03129555 <dbl>,
## # cg04412904 <dbl>, cg26219488 <dbl>, cg00154902 <dbl>, cg20913114 <dbl>, cg03084184 <dbl>,
## # cg12279734 <dbl>, cg01153376 <dbl>, cg16771215 <dbl>, cg04248279 <dbl>, cg06536614 <dbl>, …
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- Number_fea_input
combined_importance_freq_ordered_df <- combined_importance_Avg_ordered
df_Selected_Frequency_Imp <- function(n_select_frequencyWay,FeatureImportanceTable){
# In this function, we Input the feature importance data frame,
# And process with the steps we discussed before.
# The output will be the feature frequency Table.
# (i.e. frequency of the appearance of each features based on the Top Number of features selected)
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature
# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature
# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature
# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature
# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))
models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models),
dimnames = list(all_features, models))
# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
feature_matrix[feature, "LRM"] <-
as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
feature_matrix[feature, "XGB"] <-
as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
feature_matrix[feature, "ENM"] <-
as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
feature_matrix[feature, "RF"] <-
as.integer(feature %in% top_impAvg_orderby_RF_NAME)
feature_matrix[feature, "SVM"] <-
as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}
# Convert the matrix to a data frame
feature_df <- as.data.frame(feature_matrix)
feature_df$Total_Count <- rowSums(feature_df[,1:5])
# Sort the dataframe by the Total_Count in descending order
feature_df <- feature_df[order(-feature_df$Total_Count), ]
print(feature_df)
return(feature_df)
}
Now, the function will be tested below:
df_Func_test<-df_Selected_Frequency_Imp(NUM_COMMON_FEATURES_SET_Frequency,combined_importance_freq_ordered_df)
## LRM XGB ENM RF SVM Total_Count
## cg00962106 1 1 1 1 0 4
## PC1 1 0 1 0 1 3
## cg14710850 1 0 1 1 0 3
## cg02981548 1 1 1 0 0 3
## cg08861434 1 0 1 0 1 3
## cg07152869 1 0 1 0 1 3
## cg05096415 1 1 0 0 1 3
## cg23432430 1 0 1 0 1 3
## cg17186592 1 0 0 1 1 3
## cg09584650 1 1 1 0 0 3
## age.now 0 1 0 1 1 3
## cg16652920 0 1 1 1 0 3
## cg06864789 0 1 1 1 0 3
## cg08857872 0 1 1 1 0 3
## cg01921484 0 1 0 1 1 3
## cg26948066 0 1 1 0 1 3
## PC2 1 0 1 0 0 2
## PC3 1 0 1 0 0 2
## cg02225060 1 0 1 0 0 2
## cg27452255 1 0 1 0 0 2
## cg19503462 1 0 1 0 0 2
## cg16749614 1 0 1 0 0 2
## cg11133939 1 0 1 0 0 2
## cg15501526 0 1 0 1 0 2
## cg25259265 0 1 0 1 0 2
## cg01128042 0 1 0 0 1 2
## cg02494911 0 1 0 1 0 2
## cg12279734 0 0 0 1 1 2
## cg00247094 1 0 0 0 0 1
## cg16715186 1 0 0 0 0 1
## cg03129555 1 0 0 0 0 1
## cg14564293 0 1 0 0 0 1
## cg04412904 0 1 0 0 0 1
## cg16771215 0 1 0 0 0 1
## cg02621446 0 1 0 0 0 1
## cg15865722 0 1 0 0 0 1
## cg03327352 0 1 0 0 0 1
## cg02372404 0 0 1 0 0 1
## cg01153376 0 0 0 1 0 1
## cg23658987 0 0 0 1 0 1
## cg14293999 0 0 0 1 0 1
## cg05570109 0 0 0 1 0 1
## cg21209485 0 0 0 1 0 1
## cg16579946 0 0 0 1 0 1
## cg14924512 0 0 0 1 0 1
## cg07523188 0 0 0 1 0 1
## cg25879395 0 0 0 0 1 1
## cg26757229 0 0 0 0 1 1
## cg26069044 0 0 0 0 1 1
## cg00999469 0 0 0 0 1 1
## cg24861747 0 0 0 0 1 1
## cg01013522 0 0 0 0 1 1
## cg05234269 0 0 0 0 1 1
## cg00616572 0 0 0 0 1 1
## cg01680303 0 0 0 0 1 1
# The expected output should be zero.
sum(df_Func_test!=frequency_feature_df_RAW_ordered)
## [1] 0
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- Number_fea_input
df_feature_Output_frequency <- df_Selected_Frequency_Imp(Number_fea_input,
combined_importance_freq_ordered_df)
## LRM XGB ENM RF SVM Total_Count
## PC1 1 1 1 1 1 5
## PC2 1 1 1 1 1 5
## PC3 1 1 1 1 1 5
## cg00962106 1 1 1 1 1 5
## cg02225060 1 1 1 1 1 5
## cg14710850 1 1 1 1 1 5
## cg27452255 1 1 1 1 1 5
## cg02981548 1 1 1 1 1 5
## cg08861434 1 1 1 1 1 5
## cg19503462 1 1 1 1 1 5
## cg07152869 1 1 1 1 1 5
## cg16749614 1 1 1 1 1 5
## cg05096415 1 1 1 1 1 5
## cg23432430 1 1 1 1 1 5
## cg17186592 1 1 1 1 1 5
## cg00247094 1 1 1 1 1 5
## cg09584650 1 1 1 1 1 5
## cg11133939 1 1 1 1 1 5
## cg16715186 1 1 1 1 1 5
## cg03129555 1 1 1 1 1 5
## cg08857872 1 1 1 1 1 5
## cg06864789 1 1 1 1 1 5
## cg14924512 1 1 1 1 1 5
## cg16652920 1 1 1 1 1 5
## cg03084184 1 1 1 1 1 5
## cg26219488 1 1 1 1 1 5
## cg20913114 1 1 1 1 1 5
## cg06378561 1 1 1 1 1 5
## cg26948066 1 1 1 1 1 5
## cg25259265 1 1 1 1 1 5
## cg06536614 1 1 1 1 1 5
## cg24859648 1 1 1 1 1 5
## cg12279734 1 1 1 1 1 5
## cg03982462 1 1 1 1 1 5
## cg05841700 1 1 1 1 1 5
## cg11227702 1 1 1 1 1 5
## cg12146221 1 1 1 1 1 5
## cg02621446 1 1 1 1 1 5
## cg00616572 1 1 1 1 1 5
## cg15535896 1 1 1 1 1 5
## cg02372404 1 1 1 1 1 5
## cg09854620 1 1 1 1 1 5
## cg04248279 1 1 1 1 1 5
## cg20678988 1 1 1 1 1 5
## cg24861747 1 1 1 1 1 5
## cg10240127 1 1 1 1 1 5
## cg16771215 1 1 1 1 1 5
## cg01667144 1 1 1 1 1 5
## cg13080267 1 1 1 1 1 5
## cg02494911 1 1 1 1 1 5
## cg10750306 1 1 1 1 1 5
## cg11438323 1 1 1 1 1 5
## cg06715136 1 1 1 1 1 5
## cg04412904 1 1 1 1 1 5
## cg12738248 1 1 1 1 1 5
## cg03071582 1 1 1 1 1 5
## cg05570109 1 1 1 1 1 5
## cg15775217 1 1 1 1 1 5
## cg24873924 1 1 1 1 1 5
## cg17738613 1 1 1 1 1 5
## cg01921484 1 1 1 1 1 5
## cg10369879 1 1 1 1 1 5
## cg27341708 1 1 1 1 1 5
## cg12534577 1 1 1 1 1 5
## cg18821122 1 1 1 1 1 5
## cg12682323 1 1 1 1 1 5
## cg05234269 1 1 1 1 1 5
## cg20685672 1 1 1 1 1 5
## cg12228670 1 1 1 1 1 5
## cg11331837 1 1 1 1 1 5
## cg01680303 1 1 1 1 1 5
## cg17421046 1 1 1 1 1 5
## cg03088219 1 1 1 1 1 5
## cg02356645 1 1 1 1 1 5
## cg00322003 1 1 1 1 1 5
## cg01013522 1 1 1 1 1 5
## cg00272795 1 1 1 1 1 5
## cg25758034 1 1 1 1 1 5
## cg26474732 1 1 1 1 1 5
## cg16579946 1 1 1 1 1 5
## cg07523188 1 1 1 1 1 5
## cg11187460 1 1 1 1 1 5
## cg14527649 1 1 1 1 1 5
## cg20370184 1 1 1 1 1 5
## cg17429539 1 1 1 1 1 5
## cg20507276 1 1 1 1 1 5
## cg13885788 1 1 1 1 1 5
## cg16178271 1 1 1 1 1 5
## cg10738648 1 1 1 1 1 5
## cg26069044 1 1 1 1 1 5
## cg25879395 1 1 1 1 1 5
## cg06112204 1 1 1 1 1 5
## cg23161429 1 1 1 1 1 5
## cg25436480 1 1 1 1 1 5
## cg26757229 1 1 1 1 1 5
## cg02932958 1 1 1 1 1 5
## cg18339359 1 1 1 1 1 5
## cg23916408 1 1 1 1 1 5
## cg06950937 1 1 1 1 1 5
## cg12784167 1 1 1 1 1 5
## cg07480176 1 1 1 1 1 5
## cg15865722 1 1 1 1 1 5
## cg27577781 1 1 1 1 1 5
## cg05321907 1 1 1 1 1 5
## cg03660162 1 1 1 1 1 5
## cg07138269 1 1 1 1 1 5
## cg20139683 1 1 1 1 1 5
## cg12284872 1 1 1 1 1 5
## cg03327352 1 1 1 1 1 5
## cg23658987 1 1 1 1 1 5
## cg21854924 1 1 1 1 1 5
## cg21697769 1 1 1 1 1 5
## cg19512141 1 1 1 1 1 5
## cg08198851 1 1 1 1 1 5
## cg00675157 1 1 1 1 1 5
## cg01153376 1 1 1 1 1 5
## cg01933473 1 1 1 1 1 5
## cg12776173 1 1 1 1 1 5
## cg14564293 1 1 1 1 1 5
## cg24851651 1 1 1 1 1 5
## cg22274273 1 1 1 1 1 5
## cg25561557 1 1 1 1 1 5
## cg21209485 1 1 1 1 1 5
## cg10985055 1 1 1 1 1 5
## cg14293999 1 1 1 1 1 5
## cg18819889 1 1 1 1 1 5
## cg24506579 1 1 1 1 1 5
## cg19377607 1 1 1 1 1 5
## cg06697310 1 1 1 1 1 5
## cg00696044 1 1 1 1 1 5
## cg01549082 1 1 1 1 1 5
## cg01128042 1 1 1 1 1 5
## cg00999469 1 1 1 1 1 5
## cg06118351 1 1 1 1 1 5
## cg12012426 1 1 1 1 1 5
## cg08584917 1 1 1 1 1 5
## cg27272246 1 1 1 1 1 5
## cg15633912 1 1 1 1 1 5
## cg16788319 1 1 1 1 1 5
## cg17906851 1 1 1 1 1 5
## cg07028768 1 1 1 1 1 5
## cg27086157 1 1 1 1 1 5
## cg14240646 1 1 1 1 1 5
## cg00154902 1 1 1 1 1 5
## cg14307563 1 1 1 1 1 5
## cg02320265 1 1 1 1 1 5
## cg08779649 1 1 1 1 1 5
## cg04664583 1 1 1 1 1 5
## cg12466610 1 1 1 1 1 5
## cg27639199 1 1 1 1 1 5
## cg15501526 1 1 1 1 1 5
## cg00689685 1 1 1 1 1 5
## cg01413796 1 1 1 1 1 5
## cg11247378 1 1 1 1 1 5
## age.now 1 1 1 1 1 5
all_out_features <- union(combined_importance_freq_ordered_df$Feature, rownames(df_feature_Output_frequency))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_output_df_full <- data.frame(Feature = all_out_features)
feature_output_df_full <- merge(feature_output_df_full, df_feature_Output_frequency, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_output_df_full[is.na(feature_output_df_full)] <- 0
# For top_impAvg_ordered
all_output_impAvg_ordered_full <- data.frame(Feature = all_out_features)
all_output_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,
all_output_impAvg_ordered_full,
by.x = "Feature",
by.y = "Feature",
all.x = TRUE)
all_output_impAvg_ordered_full[is.na(all_output_impAvg_ordered_full)] <- 0
all_Output_combined_df_impAvg <- merge(feature_output_df_full,
all_output_impAvg_ordered_full,
by = "Feature",
all = TRUE)
print(head(feature_output_df_full))
## Feature LRM XGB ENM RF SVM Total_Count
## 1 age.now 1 1 1 1 1 5
## 2 cg00154902 1 1 1 1 1 5
## 3 cg00247094 1 1 1 1 1 5
## 4 cg00272795 1 1 1 1 1 5
## 5 cg00322003 1 1 1 1 1 5
## 6 cg00616572 1 1 1 1 1 5
print(head(all_output_impAvg_ordered_full))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM
## 1 age.now 0.00000000 1.0000000 0.0000000 0.45372158 0.8333333
## 2 cg00154902 0.08879263 0.2688349 0.3713159 0.33502754 0.5833333
## 3 cg00247094 0.41278095 0.2185408 0.4245031 0.23013585 0.5833333
## 4 cg00272795 0.21295491 0.1985510 0.2309999 0.09024509 0.3333333
## 5 cg00322003 0.21752832 0.1465702 0.3430531 0.27821774 0.5833333
## 6 cg00616572 0.28381319 0.1715595 0.3572845 0.17891065 0.6666667
## Average_Importance
## 1 0.4574110
## 2 0.3294609
## 3 0.3738588
## 4 0.2132168
## 5 0.3137405
## 6 0.3316469
print(head(all_Output_combined_df_impAvg))
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1
## 1 age.now 1 1 1 1 1 5 0.00000000 1.0000000 0.0000000
## 2 cg00154902 1 1 1 1 1 5 0.08879263 0.2688349 0.3713159
## 3 cg00247094 1 1 1 1 1 5 0.41278095 0.2185408 0.4245031
## 4 cg00272795 1 1 1 1 1 5 0.21295491 0.1985510 0.2309999
## 5 cg00322003 1 1 1 1 1 5 0.21752832 0.1465702 0.3430531
## 6 cg00616572 1 1 1 1 1 5 0.28381319 0.1715595 0.3572845
## Importance_RF Importance_SVM Average_Importance
## 1 0.45372158 0.8333333 0.4574110
## 2 0.33502754 0.5833333 0.3294609
## 3 0.23013585 0.5833333 0.3738588
## 4 0.09024509 0.3333333 0.2132168
## 5 0.27821774 0.5833333 0.3137405
## 6 0.17891065 0.6666667 0.3316469
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.
if(METHOD_FEATURE_FLAG == 6){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m6_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m6[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 5){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m5_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m5[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 4){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m4_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m4[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==3){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m3_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m3[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==1){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m1_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m1[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
## # A tibble: 6 × 156
## DX PC1 PC2 PC3 cg00962106 cg02225060 cg14710850 cg27452255 cg02981548
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI -0.214 0.0147 -0.0140 0.912 0.683 0.805 0.900 0.134
## 2 CN -0.173 0.0575 0.00506 0.538 0.827 0.809 0.659 0.522
## 3 CN -0.00367 0.0837 0.0291 0.504 0.521 0.829 0.901 0.510
## 4 Dementia -0.187 -0.0112 -0.0323 0.904 0.808 0.834 0.890 0.566
## 5 MCI 0.0268 0.0000165 0.0529 0.896 0.608 0.850 0.578 0.568
## 6 CN -0.0379 0.0157 -0.00869 0.886 0.764 0.821 0.881 0.508
## # ℹ 147 more variables: cg08861434 <dbl>, cg19503462 <dbl>, cg07152869 <dbl>, cg16749614 <dbl>,
## # cg05096415 <dbl>, cg23432430 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>, cg09584650 <dbl>,
## # cg11133939 <dbl>, cg16715186 <dbl>, cg03129555 <dbl>, cg08857872 <dbl>, cg06864789 <dbl>,
## # cg14924512 <dbl>, cg16652920 <dbl>, cg03084184 <dbl>, cg26219488 <dbl>, cg20913114 <dbl>,
## # cg06378561 <dbl>, cg26948066 <dbl>, cg25259265 <dbl>, cg06536614 <dbl>, cg24859648 <dbl>,
## # cg12279734 <dbl>, cg03982462 <dbl>, cg05841700 <dbl>, cg11227702 <dbl>, cg12146221 <dbl>,
## # cg02621446 <dbl>, cg00616572 <dbl>, cg15535896 <dbl>, cg02372404 <dbl>, cg09854620 <dbl>, …
## [1] "The number of final used features of common importance method: 155"
## [1] "PC1" "PC2" "PC3" "cg00962106" "cg02225060" "cg14710850" "cg27452255"
## [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555" "cg08857872"
## [22] "cg06864789" "cg14924512" "cg16652920" "cg03084184" "cg26219488" "cg20913114" "cg06378561"
## [29] "cg26948066" "cg25259265" "cg06536614" "cg24859648" "cg12279734" "cg03982462" "cg05841700"
## [36] "cg11227702" "cg12146221" "cg02621446" "cg00616572" "cg15535896" "cg02372404" "cg09854620"
## [43] "cg04248279" "cg20678988" "cg24861747" "cg10240127" "cg16771215" "cg01667144" "cg13080267"
## [50] "cg02494911" "cg10750306" "cg11438323" "cg06715136" "cg04412904" "cg12738248" "cg03071582"
## [57] "cg05570109" "cg15775217" "cg24873924" "cg17738613" "cg01921484" "cg10369879" "cg27341708"
## [64] "cg12534577" "cg18821122" "cg12682323" "cg05234269" "cg20685672" "cg12228670" "cg11331837"
## [71] "cg01680303" "cg17421046" "cg03088219" "cg02356645" "cg00322003" "cg01013522" "cg00272795"
## [78] "cg25758034" "cg26474732" "cg16579946" "cg07523188" "cg11187460" "cg14527649" "cg20370184"
## [85] "cg17429539" "cg20507276" "cg13885788" "cg16178271" "cg10738648" "cg26069044" "cg25879395"
## [92] "cg06112204" "cg23161429" "cg25436480" "cg26757229" "cg02932958" "cg18339359" "cg23916408"
## [99] "cg06950937" "cg12784167" "cg07480176" "cg15865722" "cg27577781" "cg05321907" "cg03660162"
## [106] "cg07138269" "cg20139683" "cg12284872" "cg03327352" "cg23658987" "cg21854924" "cg21697769"
## [113] "cg19512141" "cg08198851" "cg00675157" "cg01153376" "cg01933473" "cg12776173" "cg14564293"
## [120] "cg24851651" "cg22274273" "cg25561557" "cg21209485" "cg10985055" "cg14293999" "cg18819889"
## [127] "cg24506579" "cg19377607" "cg06697310" "cg00696044" "cg01549082" "cg01128042" "cg00999469"
## [134] "cg06118351" "cg12012426" "cg08584917" "cg27272246" "cg15633912" "cg16788319" "cg17906851"
## [141] "cg07028768" "cg27086157" "cg14240646" "cg00154902" "cg14307563" "cg02320265" "cg08779649"
## [148] "cg04664583" "cg12466610" "cg27639199" "cg15501526" "cg00689685" "cg01413796" "cg11247378"
## [155] "age.now"
## DX PC1 PC2 PC3 cg00962106 cg02225060
## 200223270003_R02C01 MCI -0.214185447 1.470293e-02 -0.014043316 0.9124898 0.6828159
## 200223270003_R03C01 CN -0.172761185 5.745834e-02 0.005055871 0.5375751 0.8265195
## 200223270003_R06C01 CN -0.003667305 8.372861e-02 0.029143653 0.5040948 0.5209552
## 200223270003_R07C01 Dementia -0.186779607 -1.117250e-02 -0.032302430 0.9039029 0.8078889
## 200223270006_R01C01 MCI 0.026814649 1.650735e-05 0.052947950 0.8961556 0.6084903
## 200223270006_R04C01 CN -0.037862929 1.571950e-02 -0.008685676 0.8857597 0.7638781
## cg14710850 cg27452255 cg02981548 cg08861434 cg19503462 cg07152869
## 200223270003_R02C01 0.8048592 0.9001010 0.1342571 0.8768306 0.7951675 0.8284151
## 200223270003_R03C01 0.8090950 0.6593379 0.5220037 0.4352647 0.4537684 0.5050630
## 200223270003_R06C01 0.8285902 0.9012217 0.5098965 0.8698813 0.6997359 0.8352490
## 200223270003_R07C01 0.8336457 0.8898635 0.5660985 0.4709249 0.7189778 0.5194300
## 200223270006_R01C01 0.8500725 0.5779792 0.5678714 0.8618532 0.7301755 0.5025709
## 200223270006_R04C01 0.8207247 0.8809143 0.5079859 0.9058965 0.4207207 0.8080916
## cg16749614 cg05096415 cg23432430 cg17186592 cg00247094 cg09584650
## 200223270003_R02C01 0.8678741 0.9182527 0.9482702 0.9230463 0.5399349 0.08230254
## 200223270003_R03C01 0.8539348 0.5177819 0.9455418 0.8593448 0.9315640 0.09661586
## 200223270003_R06C01 0.5874127 0.6288426 0.9418716 0.8467599 0.5177874 0.52399749
## 200223270003_R07C01 0.5555391 0.6060271 0.9426559 0.4986373 0.5377765 0.11587211
## 200223270006_R01C01 0.8026346 0.5599588 0.9461736 0.8978999 0.9109309 0.42115185
## 200223270006_R04C01 0.7903978 0.5441200 0.9508404 0.9239750 0.5266535 0.56043178
## cg11133939 cg16715186 cg03129555 cg08857872 cg06864789 cg14924512
## 200223270003_R02C01 0.1282694 0.2742789 0.6079616 0.3395280 0.05369415 0.5303907
## 200223270003_R03C01 0.5920898 0.7946153 0.5785498 0.8181845 0.46053125 0.9160885
## 200223270003_R06C01 0.5127706 0.8124316 0.9137818 0.2970779 0.87513655 0.9088414
## 200223270003_R07C01 0.8474176 0.7773263 0.9043041 0.2954090 0.49020327 0.9081681
## 200223270006_R01C01 0.8589133 0.8334531 0.9286357 0.8935876 0.47852685 0.9111789
## 200223270006_R04C01 0.5246557 0.8039945 0.9088564 0.8901338 0.05423587 0.5331753
## cg16652920 cg03084184 cg26219488 cg20913114 cg06378561 cg26948066
## 200223270003_R02C01 0.9436000 0.8162981 0.9336638 0.36510482 0.9389306 0.4685225
## 200223270003_R03C01 0.9431222 0.7877128 0.9134707 0.80382984 0.9377503 0.5026045
## 200223270003_R06C01 0.9457161 0.4546397 0.9261878 0.03158439 0.5154019 0.9101976
## 200223270003_R07C01 0.9419785 0.7812413 0.9217866 0.81256840 0.9403569 0.9379543
## 200223270006_R01C01 0.9529417 0.7818230 0.4929692 0.81502059 0.4956816 0.9120181
## 200223270006_R04C01 0.9492648 0.7725853 0.9431574 0.90468830 0.9268832 0.8868608
## cg25259265 cg06536614 cg24859648 cg12279734 cg03982462 cg05841700
## 200223270003_R02C01 0.4356646 0.5824474 0.83777536 0.6435368 0.8562777 0.2923544
## 200223270003_R03C01 0.8893591 0.5746694 0.44392797 0.1494651 0.6023731 0.9146488
## 200223270003_R06C01 0.4201700 0.5773468 0.03341185 0.8760759 0.8778458 0.3737990
## 200223270003_R07C01 0.4455517 0.5848917 0.43582347 0.8674214 0.8860227 0.5046468
## 200223270006_R01C01 0.8423337 0.5669919 0.03087161 0.6454450 0.8703107 0.8419031
## 200223270006_R04C01 0.8460736 0.5718514 0.02588024 0.8660058 0.8792860 0.9286652
## cg11227702 cg12146221 cg02621446 cg00616572 cg15535896 cg02372404
## 200223270003_R02C01 0.86486075 0.2049284 0.8731313 0.9335067 0.3382952 0.03598249
## 200223270003_R03C01 0.49184121 0.1814927 0.8095534 0.9214079 0.9253926 0.02767285
## 200223270003_R06C01 0.02543724 0.8619250 0.7511582 0.9113633 0.3320191 0.03127855
## 200223270003_R07C01 0.45150971 0.1238469 0.8773609 0.9160238 0.9409104 0.55685785
## 200223270006_R01C01 0.89086877 0.2021598 0.2046541 0.4861334 0.9326027 0.02587736
## 200223270006_R04C01 0.87675947 0.1383786 0.7963817 0.9067928 0.9156401 0.02828648
## cg09854620 cg04248279 cg20678988 cg24861747 cg10240127 cg16771215
## 200223270003_R02C01 0.5220587 0.8534976 0.8438718 0.3540897 0.9250553 0.88389723
## 200223270003_R03C01 0.8739646 0.8458854 0.8548886 0.4309505 0.9403255 0.07196933
## 200223270003_R06C01 0.8973149 0.8332786 0.7786685 0.8071462 0.9056974 0.09949974
## 200223270003_R07C01 0.8958863 0.3303204 0.8260541 0.3347317 0.9396217 0.64234023
## 200223270006_R01C01 0.9075331 0.5966878 0.3295384 0.3544795 0.9262370 0.62679274
## 200223270006_R04C01 0.9318820 0.8939599 0.8541667 0.5997840 0.9240497 0.06970175
## cg01667144 cg13080267 cg02494911 cg10750306 cg11438323 cg06715136
## 200223270003_R02C01 0.8971484 0.78936656 0.3049435 0.04919915 0.4863471 0.3400192
## 200223270003_R03C01 0.3175389 0.78371483 0.2416332 0.55160081 0.8984559 0.9259109
## 200223270003_R06C01 0.9238364 0.09436069 0.2520909 0.54694332 0.8722772 0.9079807
## 200223270003_R07C01 0.8739442 0.09351259 0.2457032 0.59824543 0.5026756 0.6782105
## 200223270006_R01C01 0.2931961 0.45173796 0.8045030 0.53158639 0.8809646 0.8369052
## 200223270006_R04C01 0.8616530 0.49866715 0.7489283 0.05646838 0.8717937 0.8807568
## cg04412904 cg12738248 cg03071582 cg05570109 cg15775217 cg24873924
## 200223270003_R02C01 0.05088595 0.85430866 0.9187811 0.3466611 0.5707441 0.3060635
## 200223270003_R03C01 0.07717659 0.88010292 0.5844421 0.5866750 0.9168327 0.8640985
## 200223270003_R06C01 0.08253743 0.51121855 0.6245558 0.4046471 0.6042521 0.8259149
## 200223270003_R07C01 0.06217431 0.09131476 0.9283683 0.6014355 0.9062231 0.8333940
## 200223270006_R01C01 0.11888769 0.91529345 0.5715416 0.5774881 0.9083515 0.8761177
## 200223270006_R04C01 0.08885846 0.91911405 0.6534650 0.8756826 0.6383270 0.8585363
## cg17738613 cg01921484 cg10369879 cg27341708 cg12534577 cg18821122
## 200223270003_R02C01 0.6879612 0.90985496 0.9218784 0.48846610 0.8585231 0.9291309
## 200223270003_R03C01 0.6582258 0.90931369 0.3149306 0.02613847 0.8493466 0.5901603
## 200223270003_R06C01 0.1022257 0.92044873 0.9141081 0.86893582 0.8395241 0.5779620
## 200223270003_R07C01 0.8960156 0.91674311 0.9054415 0.02642300 0.8511384 0.9251431
## 200223270006_R01C01 0.8850702 0.02943747 0.2917862 0.47573455 0.8804655 0.9217018
## 200223270006_R04C01 0.8481916 0.89057041 0.9200403 0.89411974 0.3029013 0.5412250
## cg12682323 cg05234269 cg20685672 cg12228670 cg11331837 cg01680303
## 200223270003_R02C01 0.9397956 0.93848584 0.67121006 0.8632174 0.03692842 0.5095174
## 200223270003_R03C01 0.9003940 0.57461229 0.79320906 0.8496212 0.57150125 0.1344941
## 200223270003_R06C01 0.9157877 0.02467208 0.66136456 0.8738949 0.03182862 0.7573869
## 200223270003_R07C01 0.9048877 0.56516794 0.80838304 0.8362189 0.03832164 0.4772204
## 200223270006_R01C01 0.1065347 0.94829529 0.08291414 0.8079694 0.93008298 0.1176263
## 200223270006_R04C01 0.8836232 0.56298286 0.84460055 0.6966666 0.54004452 0.5133033
## cg17421046 cg03088219 cg02356645 cg00322003 cg01013522 cg00272795
## 200223270003_R02C01 0.9026993 0.844002862 0.5105903 0.1759911 0.6251168 0.46365138
## 200223270003_R03C01 0.9112100 0.007435243 0.5833923 0.5702070 0.8862821 0.82839260
## 200223270003_R06C01 0.8952031 0.120155222 0.5701428 0.3077122 0.5425308 0.07231279
## 200223270003_R07C01 0.9268852 0.826554308 0.5683381 0.6104341 0.8429862 0.78303831
## 200223270006_R01C01 0.1118337 0.066294915 0.5233692 0.6147419 0.0480531 0.78219952
## 200223270006_R04C01 0.4174370 0.574738383 0.9188670 0.2293759 0.8240222 0.44408249
## cg25758034 cg26474732 cg16579946 cg07523188 cg11187460 cg14527649
## 200223270003_R02C01 0.6114028 0.7843252 0.6306315 0.7509183 0.03672179 0.2678912
## 200223270003_R03C01 0.6649219 0.8184088 0.6648766 0.1524386 0.92516409 0.7954683
## 200223270003_R06C01 0.2393844 0.7358417 0.6455081 0.7127592 0.03109553 0.8350610
## 200223270003_R07C01 0.7071501 0.7509296 0.8979650 0.8464983 0.53283119 0.8428684
## 200223270006_R01C01 0.2301078 0.8294938 0.6886498 0.7847738 0.54038146 0.8231348
## 200223270006_R04C01 0.6891513 0.8033167 0.6766907 0.8231277 0.91096169 0.8022444
## cg20370184 cg17429539 cg20507276 cg13885788 cg16178271 cg10738648
## 200223270003_R02C01 0.37710950 0.7860900 0.12238910 0.9380618 0.6445416 0.44931577
## 200223270003_R03C01 0.05737964 0.7100923 0.38721972 0.9369476 0.6178075 0.49894016
## 200223270003_R06C01 0.04740505 0.7660838 0.47978438 0.5163017 0.6641952 0.05552024
## 200223270003_R07C01 0.83572095 0.6984969 0.02261996 0.9183376 0.7148058 0.03730440
## 200223270006_R01C01 0.04056608 0.6508597 0.37465798 0.5525542 0.6138954 0.54952781
## 200223270006_R04C01 0.04038589 0.2828452 0.03570795 0.9328289 0.9414188 0.59358167
## cg26069044 cg25879395 cg06112204 cg23161429 cg25436480 cg26757229
## 200223270003_R02C01 0.92401867 0.88130864 0.5251592 0.8956965 0.84251599 0.6723726
## 200223270003_R03C01 0.94072227 0.02603438 0.8773488 0.9099619 0.49940321 0.1422661
## 200223270003_R06C01 0.93321315 0.91060615 0.8867975 0.8833895 0.34943119 0.7933794
## 200223270003_R07C01 0.56567694 0.89205942 0.5613799 0.9134709 0.85244913 0.8074830
## 200223270006_R01C01 0.94369927 0.47886249 0.9184122 0.8738558 0.44545117 0.5265692
## 200223270006_R04C01 0.02040391 0.02145248 0.9152514 0.9104210 0.02575036 0.7341953
## cg02932958 cg18339359 cg23916408 cg06950937 cg12784167 cg07480176
## 200223270003_R02C01 0.7901008 0.8824858 0.1942275 0.8910968 0.81503498 0.5171664
## 200223270003_R03C01 0.4210489 0.9040272 0.9154993 0.2889345 0.02811410 0.3760452
## 200223270003_R06C01 0.3825995 0.8552121 0.8886255 0.9143801 0.03073269 0.6998389
## 200223270003_R07C01 0.7617081 0.3073106 0.8872447 0.8891079 0.84775699 0.2189042
## 200223270006_R01C01 0.8431126 0.8973742 0.2219945 0.8868617 0.83825789 0.5570021
## 200223270006_R04C01 0.7610084 0.2292800 0.1520624 0.9093273 0.45475291 0.4501196
## cg15865722 cg27577781 cg05321907 cg03660162 cg07138269 cg20139683
## 200223270003_R02C01 0.89438595 0.8143535 0.2880477 0.8691767 0.5002290 0.8717075
## 200223270003_R03C01 0.90194372 0.8113185 0.1782629 0.5160770 0.9426707 0.9059433
## 200223270003_R06C01 0.92118977 0.8144274 0.8427929 0.9026304 0.5057781 0.8962554
## 200223270003_R07C01 0.09230759 0.7970617 0.8320504 0.5305691 0.9400527 0.9218012
## 200223270006_R01C01 0.93422668 0.8640044 0.2422218 0.9257451 0.9321602 0.1708472
## 200223270006_R04C01 0.92220002 0.8840237 0.2429551 0.8935772 0.9333501 0.1067122
## cg12284872 cg03327352 cg23658987 cg21854924 cg21697769 cg19512141
## 200223270003_R02C01 0.8008333 0.8851712 0.79757644 0.8729132 0.8946108 0.8209161
## 200223270003_R03C01 0.7414569 0.8786878 0.07511718 0.7162342 0.2822953 0.7903543
## 200223270003_R06C01 0.7725267 0.3042310 0.10177571 0.7520990 0.8698740 0.8404684
## 200223270003_R07C01 0.7573369 0.8273211 0.46747992 0.8641284 0.9134887 0.2202759
## 200223270006_R01C01 0.7201607 0.8774082 0.76831297 0.6498895 0.2683820 0.8059589
## 200223270006_R04C01 0.8021446 0.8829492 0.08988532 0.5943113 0.2765740 0.7020247
## cg08198851 cg00675157 cg01153376 cg01933473 cg12776173 cg14564293
## 200223270003_R02C01 0.6578905 0.9188438 0.4872148 0.2589014 0.10388038 0.52089591
## 200223270003_R03C01 0.6578186 0.9242325 0.9639670 0.6726133 0.87306345 0.04000662
## 200223270003_R06C01 0.1272153 0.9254708 0.2242410 0.2642560 0.70094907 0.04959460
## 200223270003_R07C01 0.8351465 0.5447244 0.5155654 0.1978068 0.11367159 0.03114773
## 200223270006_R01C01 0.8791156 0.5173554 0.9588916 0.7599441 0.09458405 0.51703196
## 200223270006_R04C01 0.1423737 0.9247232 0.9586876 0.7405661 0.86532175 0.51535010
## cg24851651 cg22274273 cg25561557 cg21209485 cg10985055 cg14293999
## 200223270003_R02C01 0.03674702 0.4209386 0.76736369 0.8865053 0.8518169 0.2836710
## 200223270003_R03C01 0.05358297 0.4246379 0.03851635 0.8714878 0.8631895 0.9172023
## 200223270003_R06C01 0.05968923 0.4196796 0.47259480 0.2292550 0.5456633 0.9168166
## 200223270003_R07C01 0.60864179 0.4164100 0.43364249 0.2351526 0.8825100 0.9188336
## 200223270006_R01C01 0.08825834 0.7951105 0.46211439 0.8882046 0.8841690 0.1971116
## 200223270006_R04C01 0.91932068 0.0229810 0.44651530 0.2292483 0.8407797 0.9030919
## cg18819889 cg24506579 cg19377607 cg06697310 cg00696044 cg01549082
## 200223270003_R02C01 0.9156157 0.5244337 0.05377464 0.8454609 0.55608424 0.2924138
## 200223270003_R03C01 0.9004455 0.5794845 0.90570746 0.8653044 0.07552381 0.7065693
## 200223270003_R06C01 0.9054439 0.9427785 0.06636174 0.2405168 0.79270858 0.2895440
## 200223270003_R07C01 0.9089935 0.9323844 0.68788639 0.8479193 0.03548419 0.6422955
## 200223270006_R01C01 0.9065397 0.9185355 0.06338988 0.8206613 0.10714386 0.8471236
## 200223270006_R04C01 0.9242767 0.4332642 0.91551446 0.7839595 0.18420803 0.6949888
## cg01128042 cg00999469 cg06118351 cg12012426 cg08584917 cg27272246
## 200223270003_R02C01 0.9113420 0.3274080 0.36339400 0.9165048 0.5663205 0.8615873
## 200223270003_R03C01 0.5328806 0.2857719 0.47148604 0.9434768 0.9019732 0.8705287
## 200223270003_R06C01 0.5222757 0.2499229 0.86559618 0.9220044 0.9187789 0.8103777
## 200223270003_R07C01 0.5141721 0.2819622 0.83494303 0.9241284 0.6007449 0.0310881
## 200223270006_R01C01 0.9321215 0.2933539 0.02632111 0.9327143 0.9069098 0.7686536
## 200223270006_R04C01 0.5050081 0.2966623 0.83329300 0.9271167 0.9263584 0.4403542
## cg15633912 cg16788319 cg17906851 cg07028768 cg27086157 cg14240646
## 200223270003_R02C01 0.1605530 0.9379870 0.9488392 0.4496851 0.9224112 0.5391334
## 200223270003_R03C01 0.9333421 0.8913429 0.9529718 0.8536078 0.9219304 0.2538363
## 200223270003_R06C01 0.8737362 0.8680680 0.6462151 0.8356936 0.3224986 0.1864902
## 200223270003_R07C01 0.9137334 0.8811444 0.9553497 0.4245893 0.3455486 0.6402007
## 200223270006_R01C01 0.9169706 0.3123481 0.6222117 0.8835151 0.8988962 0.7696079
## 200223270006_R04C01 0.8890004 0.2995627 0.6441202 0.4514661 0.9159217 0.1490028
## cg00154902 cg14307563 cg02320265 cg08779649 cg04664583 cg12466610
## 200223270003_R02C01 0.5137741 0.1855966 0.8853213 0.44449401 0.5572814 0.05767659
## 200223270003_R03C01 0.8540746 0.8916957 0.4686314 0.45076825 0.5881190 0.59131778
## 200223270003_R06C01 0.8188126 0.8750052 0.4838749 0.04810217 0.9352717 0.06939623
## 200223270003_R07C01 0.4625776 0.8975663 0.8986848 0.42715969 0.9350230 0.04527733
## 200223270006_R01C01 0.4690086 0.8762842 0.8987560 0.89313476 0.9424588 0.05212904
## 200223270006_R04C01 0.4547219 0.9168614 0.4768520 0.59523771 0.9379537 0.05104033
## cg27639199 cg15501526 cg00689685 cg01413796 cg11247378 age.now
## 200223270003_R02C01 0.67515415 0.6362531 0.7019389 0.1345128 0.1591185 82.40000
## 200223270003_R03C01 0.67552763 0.6319253 0.8634268 0.2830672 0.7874849 78.60000
## 200223270003_R06C01 0.06233093 0.7435100 0.6378795 0.8194681 0.4807942 80.40000
## 200223270003_R07C01 0.05701332 0.7756577 0.8624541 0.9007710 0.4537348 78.16441
## 200223270006_R01C01 0.05037694 0.3230777 0.6361891 0.2603027 0.1537079 62.90000
## 200223270006_R04C01 0.08144161 0.8342695 0.6356260 0.9207672 0.1686356 80.67796
print(df_process_frequency_FeatureName)
## [1] "PC1" "PC2" "PC3" "cg00962106" "cg02225060" "cg14710850" "cg27452255"
## [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555" "cg08857872"
## [22] "cg06864789" "cg14924512" "cg16652920" "cg03084184" "cg26219488" "cg20913114" "cg06378561"
## [29] "cg26948066" "cg25259265" "cg06536614" "cg24859648" "cg12279734" "cg03982462" "cg05841700"
## [36] "cg11227702" "cg12146221" "cg02621446" "cg00616572" "cg15535896" "cg02372404" "cg09854620"
## [43] "cg04248279" "cg20678988" "cg24861747" "cg10240127" "cg16771215" "cg01667144" "cg13080267"
## [50] "cg02494911" "cg10750306" "cg11438323" "cg06715136" "cg04412904" "cg12738248" "cg03071582"
## [57] "cg05570109" "cg15775217" "cg24873924" "cg17738613" "cg01921484" "cg10369879" "cg27341708"
## [64] "cg12534577" "cg18821122" "cg12682323" "cg05234269" "cg20685672" "cg12228670" "cg11331837"
## [71] "cg01680303" "cg17421046" "cg03088219" "cg02356645" "cg00322003" "cg01013522" "cg00272795"
## [78] "cg25758034" "cg26474732" "cg16579946" "cg07523188" "cg11187460" "cg14527649" "cg20370184"
## [85] "cg17429539" "cg20507276" "cg13885788" "cg16178271" "cg10738648" "cg26069044" "cg25879395"
## [92] "cg06112204" "cg23161429" "cg25436480" "cg26757229" "cg02932958" "cg18339359" "cg23916408"
## [99] "cg06950937" "cg12784167" "cg07480176" "cg15865722" "cg27577781" "cg05321907" "cg03660162"
## [106] "cg07138269" "cg20139683" "cg12284872" "cg03327352" "cg23658987" "cg21854924" "cg21697769"
## [113] "cg19512141" "cg08198851" "cg00675157" "cg01153376" "cg01933473" "cg12776173" "cg14564293"
## [120] "cg24851651" "cg22274273" "cg25561557" "cg21209485" "cg10985055" "cg14293999" "cg18819889"
## [127] "cg24506579" "cg19377607" "cg06697310" "cg00696044" "cg01549082" "cg01128042" "cg00999469"
## [134] "cg06118351" "cg12012426" "cg08584917" "cg27272246" "cg15633912" "cg16788319" "cg17906851"
## [141] "cg07028768" "cg27086157" "cg14240646" "cg00154902" "cg14307563" "cg02320265" "cg08779649"
## [148] "cg04664583" "cg12466610" "cg27639199" "cg15501526" "cg00689685" "cg01413796" "cg11247378"
## [155] "age.now"
Selected_Frequency_Feature_importance <-all_Output_combined_df_impAvg[all_Output_combined_df_impAvg$Total_Count>=3,]
print(Selected_Frequency_Feature_importance)
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1
## 1 age.now 1 1 1 1 1 5 0.00000000 1.000000000 0.000000000
## 2 cg00154902 1 1 1 1 1 5 0.08879263 0.268834896 0.371315930
## 3 cg00247094 1 1 1 1 1 5 0.41278095 0.218540778 0.424503057
## 4 cg00272795 1 1 1 1 1 5 0.21295491 0.198550953 0.230999937
## 5 cg00322003 1 1 1 1 1 5 0.21752832 0.146570216 0.343053068
## 6 cg00616572 1 1 1 1 1 5 0.28381319 0.171559535 0.357284511
## 7 cg00675157 1 1 1 1 1 5 0.14546770 0.091681870 0.329939993
## 8 cg00689685 1 1 1 1 1 5 0.04219397 0.101123505 0.210886561
## 9 cg00696044 1 1 1 1 1 5 0.12999136 0.173039136 0.281254843
## 10 cg00962106 1 1 1 1 1 5 0.62818798 0.527922660 0.727596599
## 11 cg00999469 1 1 1 1 1 5 0.11924999 0.172508732 0.193113515
## 12 cg01013522 1 1 1 1 1 5 0.21592960 0.241888711 0.316918954
## 13 cg01128042 1 1 1 1 1 5 0.12518582 0.423396901 0.284997767
## 14 cg01153376 1 1 1 1 1 5 0.14471902 0.331692564 0.329009655
## 15 cg01413796 1 1 1 1 1 5 0.02161027 0.256050130 0.105363369
## 16 cg01549082 1 1 1 1 1 5 0.12522345 0.157926568 0.009399384
## 17 cg01667144 1 1 1 1 1 5 0.26504058 0.258542574 0.297490033
## 18 cg01680303 1 1 1 1 1 5 0.22195009 0.143689906 0.298057362
## 19 cg01921484 1 1 1 1 1 5 0.23359943 0.448455125 0.384660546
## 20 cg01933473 1 1 1 1 1 5 0.14405977 0.138623149 0.165864840
## 21 cg02225060 1 1 1 1 1 5 0.50844099 0.189218518 0.617365165
## 22 cg02320265 1 1 1 1 1 5 0.07922128 0.200640351 0.176092180
## 23 cg02356645 1 1 1 1 1 5 0.21753719 0.174918028 0.301971691
## 24 cg02372404 1 1 1 1 1 5 0.27764868 0.159713319 0.450667096
## 25 cg02494911 1 1 1 1 1 5 0.26111760 0.365368878 0.332216396
## 26 cg02621446 1 1 1 1 1 5 0.28474428 0.413508159 0.346649379
## 27 cg02932958 1 1 1 1 1 5 0.18341414 0.030277794 0.271954786
## 28 cg02981548 1 1 1 1 1 5 0.48692573 0.409443001 0.586910968
## 29 cg03071582 1 1 1 1 1 5 0.23936745 0.097173746 0.269629005
## 30 cg03084184 1 1 1 1 1 5 0.34207903 0.177639114 0.391669668
## 31 cg03088219 1 1 1 1 1 5 0.21783076 0.266905948 0.252647943
## 32 cg03129555 1 1 1 1 1 5 0.38181198 0.262127746 0.341877381
## 33 cg03327352 1 1 1 1 1 5 0.15990027 0.379266748 0.308022027
## 34 cg03660162 1 1 1 1 1 5 0.16307948 0.075703296 0.368700827
## 35 cg03982462 1 1 1 1 1 5 0.30254946 0.041317912 0.442784617
## 36 cg04248279 1 1 1 1 1 5 0.27148602 0.351871408 0.328078798
## 37 cg04412904 1 1 1 1 1 5 0.24639764 0.482166922 0.341662549
## 38 cg04664583 1 1 1 1 1 5 0.07389089 0.025123400 0.197066667
## 39 cg05096415 1 1 1 1 1 5 0.44533457 0.586739835 0.414828632
## 40 cg05234269 1 1 1 1 1 5 0.22837191 0.172234686 0.338291448
## 41 cg05321907 1 1 1 1 1 5 0.16624900 0.066630222 0.227854962
## 42 cg05570109 1 1 1 1 1 5 0.23797548 0.144787573 0.385926710
## 43 cg05841700 1 1 1 1 1 5 0.30170848 0.155412051 0.349467184
## 44 cg06112204 1 1 1 1 1 5 0.19129682 0.020074093 0.239077055
## 45 cg06118351 1 1 1 1 1 5 0.11824394 0.057558412 0.263145036
## 46 cg06378561 1 1 1 1 1 5 0.33046434 0.254873358 0.319057217
## 47 cg06536614 1 1 1 1 1 5 0.32793798 0.202932339 0.433426126
## 48 cg06697310 1 1 1 1 1 5 0.13063945 0.237623033 0.326843827
## 49 cg06715136 1 1 1 1 1 5 0.24950260 0.191920615 0.349420903
## 50 cg06864789 1 1 1 1 1 5 0.36409272 0.503835929 0.460586631
## 51 cg06950937 1 1 1 1 1 5 0.18053327 0.277466641 0.235188883
## 52 cg07028768 1 1 1 1 1 5 0.10714823 0.116925946 0.398410973
## 53 cg07138269 1 1 1 1 1 5 0.16208302 0.092407396 0.307628197
## 54 cg07152869 1 1 1 1 1 5 0.46401454 0.212363413 0.539218307
## 55 cg07480176 1 1 1 1 1 5 0.17577373 0.009422593 0.271014784
## 56 cg07523188 1 1 1 1 1 5 0.20694262 0.080073043 0.293838589
## 57 cg08198851 1 1 1 1 1 5 0.14937223 0.289014103 0.281424533
## 58 cg08584917 1 1 1 1 1 5 0.11176227 0.138616247 0.299594690
## 59 cg08779649 1 1 1 1 1 5 0.07612126 0.161307390 0.166351371
## 60 cg08857872 1 1 1 1 1 5 0.38088074 0.467593325 0.531139947
## 61 cg08861434 1 1 1 1 1 5 0.48302924 0.235876875 0.493132245
## 62 cg09584650 1 1 1 1 1 5 0.41042125 0.457112879 0.477067244
## 63 cg09854620 1 1 1 1 1 5 0.27298526 0.183209797 0.344318733
## 64 cg10240127 1 1 1 1 1 5 0.27027622 0.213287988 0.425197502
## 65 cg10369879 1 1 1 1 1 5 0.23210094 0.183817567 0.316828724
## 66 cg10738648 1 1 1 1 1 5 0.19474268 0.224614703 0.270553023
## 67 cg10750306 1 1 1 1 1 5 0.25985394 0.086335295 0.291945624
## 68 cg10985055 1 1 1 1 1 5 0.13743727 0.102700187 0.162868541
## 69 cg11133939 1 1 1 1 1 5 0.40082643 0.201244345 0.473727841
## 70 cg11187460 1 1 1 1 1 5 0.20693049 0.145437291 0.183459129
## 71 cg11227702 1 1 1 1 1 5 0.29375910 0.049153501 0.304770879
## 72 cg11247378 1 1 1 1 1 5 0.01493278 0.195085308 0.258188066
## 73 cg11331837 1 1 1 1 1 5 0.22214980 0.219715757 0.276642306
## 74 cg11438323 1 1 1 1 1 5 0.24970672 0.127152455 0.289005707
## 75 cg12012426 1 1 1 1 1 5 0.11249753 0.317869115 0.228603140
## 76 cg12146221 1 1 1 1 1 5 0.28574803 0.361724220 0.305210343
## Importance_RF Importance_SVM Average_Importance
## 1 0.45372158 0.8333333 0.4574110
## 2 0.33502754 0.5833333 0.3294609
## 3 0.23013585 0.5833333 0.3738588
## 4 0.09024509 0.3333333 0.2132168
## 5 0.27821774 0.5833333 0.3137405
## 6 0.17891065 0.6666667 0.3316469
## 7 0.16641199 0.5000000 0.2467003
## 8 0.16712976 0.4166667 0.1876001
## 9 0.14117479 0.4166667 0.2284254
## 10 0.46613808 0.2500000 0.5199691
## 11 0.26372499 0.7500000 0.2997194
## 12 0.25694826 0.6666667 0.3396704
## 13 0.22999345 0.6666667 0.3460481
## 14 0.72836889 0.2500000 0.3567580
## 15 0.08143128 0.1666667 0.1262243
## 16 0.25808487 0.0000000 0.1101269
## 17 0.24489722 0.4166667 0.2965274
## 18 0.24811633 0.6666667 0.3156961
## 19 0.41690692 0.7500000 0.4467244
## 20 0.15765991 0.3333333 0.1879082
## 21 0.23091637 0.4166667 0.3925215
## 22 0.23921737 0.4166667 0.2223676
## 23 0.05315135 0.6666667 0.2828490
## 24 0.20775241 0.5833333 0.3358230
## 25 0.39679341 0.5000000 0.3710993
## 26 0.33321305 0.5000000 0.3756230
## 27 0.08617858 0.5833333 0.2310317
## 28 0.22577927 0.4166667 0.4251451
## 29 0.15550558 0.4166667 0.2356685
## 30 0.31009075 0.3333333 0.3109624
## 31 0.15279655 0.4166667 0.2613696
## 32 0.11859695 0.5833333 0.3375495
## 33 0.23579212 0.4166667 0.2999296
## 34 0.11594685 0.5833333 0.2613528
## 35 0.21066649 0.4166667 0.2827970
## 36 0.12491500 0.3333333 0.2819369
## 37 0.28390081 0.4166667 0.3541589
## 38 0.33357822 0.3333333 0.1925985
## 39 0.28495711 0.7500000 0.4963720
## 40 0.26421598 0.6666667 0.3339561
## 41 0.26338041 0.5000000 0.2448229
## 42 0.38455253 0.5000000 0.3306485
## 43 0.01805247 0.6666667 0.2982614
## 44 0.14283075 0.5000000 0.2186557
## 45 0.29018940 0.4166667 0.2291607
## 46 0.11176792 0.5833333 0.3198992
## 47 0.04798536 0.5000000 0.3024564
## 48 0.24776591 0.5833333 0.3052411
## 49 0.20299016 0.5833333 0.3154335
## 50 0.46960918 0.5000000 0.4596249
## 51 0.18893246 0.3333333 0.2430909
## 52 0.21571670 0.4166667 0.2509737
## 53 0.06844617 0.4166667 0.2094463
## 54 0.17108416 0.6666667 0.4106694
## 55 0.16116737 0.5833333 0.2401424
## 56 0.34960936 0.5000000 0.2860927
## 57 0.16855829 0.5000000 0.2776738
## 58 0.08221573 0.3333333 0.1931045
## 59 0.13947791 0.5000000 0.2086516
## 60 0.56010360 0.4166667 0.4712769
## 61 0.14267663 0.6666667 0.4042763
## 62 0.23050169 0.5833333 0.4316873
## 63 0.32387790 0.5000000 0.3248783
## 64 0.31291846 0.5000000 0.3443360
## 65 0.29168017 0.5000000 0.3048855
## 66 0.32817697 0.5000000 0.3036175
## 67 0.07977978 0.5833333 0.2602496
## 68 0.26635858 0.3333333 0.2005396
## 69 0.34044653 0.5000000 0.3832490
## 70 0.22891961 0.4166667 0.2362826
## 71 0.04105989 0.4166667 0.2210820
## 72 0.15726047 0.5833333 0.2417600
## 73 0.33393787 0.3333333 0.2771558
## 74 0.03589151 0.5833333 0.2570179
## 75 0.07484149 0.4166667 0.2300956
## 76 0.21590054 0.5000000 0.3337166
## [ reached 'max' / getOption("max.print") -- omitted 79 rows ]
# Output data frame with selected features based on mean method:
# "selected_impAvg_ordered_NAME", This data frame don't have column named "SampleID"
if(Flag_8mean){
filename_mean <- paste0("Selected_mean", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_mean <- paste0(OUTUT_CSV_PATHNAME, filename_mean)
if (file.exists(OUTPUTPATH_mean)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_selected_Mean,
file = OUTPUTPATH_mean,
row.names = FALSE)
}
}
if(Flag_8median){
filename_median <- paste0("Selected_median", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_median <- paste0(OUTUT_CSV_PATHNAME, filename_median)
if (file.exists(OUTPUTPATH_median)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_selected_Median,
file = OUTPUTPATH_median,
row.names = FALSE)
}
}
if(Flag_8Fequency){
filename_frequency <- paste0("Selected_frequency", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_frequency <- paste0(OUTUT_CSV_PATHNAME, filename_frequency)
if (file.exists(OUTPUTPATH_frequency)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_process_Output_freq,
file = OUTPUTPATH_frequency,
row.names = FALSE)
}
}
# This is the flag of phenotype data output,
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".
phenotypeDF<-merged_df_raw[,colnames(phenoticPart_RAW)]
print(head(phenotypeDF))
## barcodes RID.a prop.B prop.NK prop.CD4T prop.CD8T
## 200223270003_R02C01 200223270003_R02C01 2190 0.03164651 0.03609239 0.010771839 0.01481567
## 200223270003_R03C01 200223270003_R03C01 4080 0.03556363 0.04697771 0.002321312 0.06381941
## 200223270003_R06C01 200223270003_R06C01 4505 0.07129589 0.04412218 0.037684081 0.11457236
## 200223270003_R07C01 200223270003_R07C01 1010 0.02081699 0.07117668 0.040966085 0.00000000
## 200223270006_R01C01 200223270006_R01C01 4226 0.02680465 0.04767947 0.128514873 0.09085886
## 200223270006_R04C01 200223270006_R04C01 1190 0.07063013 0.05250647 0.064529118 0.04309168
## prop.Mono prop.Neutro prop.Eosino DX age.now PTGENDER ABETA TAU
## 200223270003_R02C01 0.06533409 0.8413395 0 MCI 82.40000 Male 963.2 341.5
## 200223270003_R03C01 0.04901806 0.8022999 0 CN 78.60000 Female 950.6 295.9
## 200223270003_R06C01 0.08745402 0.6448715 0 CN 80.40000 Female 1705.0 353.2
## 200223270003_R07C01 0.04459325 0.8224470 0 Dementia 78.16441 Male 493.3 272.8
## 200223270006_R01C01 0.07419209 0.6319501 0 MCI 62.90000 Female 1705.0 253.1
## 200223270006_R04C01 0.08796080 0.6812818 0 CN 80.67796 Female 1336.0 439.3
## PTAU PC1 PC2 PC3 ageGroup ageGroupsq DX_num
## 200223270003_R02C01 35.48 -0.214185447 1.470293e-02 -0.014043316 0.6606949 0.43651772 0
## 200223270003_R03C01 28.08 -0.172761185 5.745834e-02 0.005055871 0.2806949 0.07878961 0
## 200223270003_R06C01 28.49 -0.003667305 8.372861e-02 0.029143653 0.4606949 0.21223977 0
## 200223270003_R07C01 22.75 -0.186779607 -1.117250e-02 -0.032302430 0.2371357 0.05623333 1
## 200223270006_R01C01 22.84 0.026814649 1.650735e-05 0.052947950 -1.2893051 1.66230770 0
## 200223270006_R04C01 40.78 -0.037862929 1.571950e-02 -0.008685676 0.4884909 0.23862336 0
## uniqueID Horvath
## 200223270003_R02C01 1 61.50365
## 200223270003_R03C01 1 69.26678
## 200223270003_R06C01 1 96.84418
## 200223270003_R07C01 1 61.76446
## 200223270006_R01C01 1 59.33885
## 200223270006_R04C01 1 70.27197
OUTPUTPATH_phenotypePart <- paste0(OUTUT_CSV_PATHNAME, "PhenotypePart_df.csv")
if(phenoOutPUt_FLAG ){
if (file.exists(OUTPUTPATH_phenotypePart)) {
print("Phenotype File already exists")}
else {
write.csv(phenotypeDF, file = OUTPUTPATH_phenotypePart, row.names = FALSE)
}
}
## [1] "Phenotype File already exists"
Performance of the selected output features based on Mean
processed_dataFrame<-df_selected_Mean
processed_data<-output_mean_process
AfterProcess_FeatureName<-selected_impAvg_ordered_NAME
print(head(output_mean_process))
## # A tibble: 6 × 156
## DX PC1 cg00962106 PC2 cg05096415 cg08857872 cg23432430 cg16652920 cg06864789
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI -0.214 0.912 0.0147 0.918 0.340 0.948 0.944 0.0537
## 2 CN -0.173 0.538 0.0575 0.518 0.818 0.946 0.943 0.461
## 3 CN -0.00367 0.504 0.0837 0.629 0.297 0.942 0.946 0.875
## 4 Dementia -0.187 0.904 -0.0112 0.606 0.295 0.943 0.942 0.490
## 5 MCI 0.0268 0.896 0.0000165 0.560 0.894 0.946 0.953 0.479
## 6 CN -0.0379 0.886 0.0157 0.544 0.890 0.951 0.949 0.0542
## # ℹ 147 more variables: age.now <dbl>, cg01921484 <dbl>, cg26948066 <dbl>, cg17186592 <dbl>,
## # cg09584650 <dbl>, cg12279734 <dbl>, cg02981548 <dbl>, cg14710850 <dbl>, PC3 <dbl>,
## # cg07152869 <dbl>, cg08861434 <dbl>, cg15501526 <dbl>, cg25259265 <dbl>, cg02225060 <dbl>,
## # cg24859648 <dbl>, cg11133939 <dbl>, cg25879395 <dbl>, cg02621446 <dbl>, cg00247094 <dbl>,
## # cg02494911 <dbl>, cg16771215 <dbl>, cg24861747 <dbl>, cg01153376 <dbl>, cg04412904 <dbl>,
## # cg20913114 <dbl>, cg01128042 <dbl>, cg10240127 <dbl>, cg14564293 <dbl>, cg16749614 <dbl>,
## # cg01013522 <dbl>, cg16579946 <dbl>, cg03129555 <dbl>, cg02372404 <dbl>, cg05234269 <dbl>, …
print(selected_impAvg_ordered_NAME)
## [1] "PC1" "cg00962106" "PC2" "cg05096415" "cg08857872" "cg23432430" "cg16652920"
## [8] "cg06864789" "age.now" "cg01921484" "cg26948066" "cg17186592" "cg09584650" "cg12279734"
## [15] "cg02981548" "cg14710850" "PC3" "cg07152869" "cg08861434" "cg15501526" "cg25259265"
## [22] "cg02225060" "cg24859648" "cg11133939" "cg25879395" "cg02621446" "cg00247094" "cg02494911"
## [29] "cg16771215" "cg24861747" "cg01153376" "cg04412904" "cg20913114" "cg01128042" "cg10240127"
## [36] "cg14564293" "cg16749614" "cg01013522" "cg16579946" "cg03129555" "cg02372404" "cg05234269"
## [43] "cg12146221" "cg12228670" "cg14924512" "cg27452255" "cg16715186" "cg00616572" "cg05570109"
## [50] "cg00154902" "cg14293999" "cg17421046" "cg15775217" "cg09854620" "cg19503462" "cg26757229"
## [57] "cg06378561" "cg01680303" "cg06715136" "cg15535896" "cg00322003" "cg27341708" "cg03084184"
## [64] "cg26219488" "cg18339359" "cg06697310" "cg10369879" "cg10738648" "cg06536614" "cg26069044"
## [71] "cg20685672" "cg03327352" "cg00999469" "cg23658987" "cg05841700" "cg01667144" "cg15865722"
## [78] "cg13885788" "cg14527649" "cg23161429" "cg20370184" "cg18821122" "cg07523188" "cg12534577"
## [85] "cg02356645" "cg03982462" "cg04248279" "cg13080267" "cg27639199" "cg08198851" "cg11331837"
## [92] "cg24873924" "cg20507276" "cg25561557" "cg22274273" "cg12682323" "cg17738613" "cg21209485"
## [99] "cg03088219" "cg03660162" "cg10750306" "cg27272246" "cg11438323" "cg12738248" "cg21854924"
## [106] "cg20139683" "cg16178271" "cg07028768" "cg26474732" "cg00675157" "cg23916408" "cg05321907"
## [113] "cg17429539" "cg06950937" "cg14240646" "cg27086157" "cg25758034" "cg11247378" "cg19377607"
## [120] "cg07480176" "cg27577781" "cg11187460" "cg03071582" "cg12284872" "cg02932958" "cg12012426"
## [127] "cg06118351" "cg00696044" "cg25436480" "cg02320265" "cg11227702" "cg18819889" "cg06112204"
## [134] "cg19512141" "cg24506579" "cg00272795" "cg21697769" "cg12776173" "cg07138269" "cg17906851"
## [141] "cg08779649" "cg10985055" "cg08584917" "cg04664583" "cg01933473" "cg00689685" "cg14307563"
## [148] "cg12784167" "cg24851651" "cg15633912" "cg12466610" "cg16788319" "cg20678988" "cg01413796"
## [155] "cg01549082"
print(head(df_selected_Mean))
## DX PC1 cg00962106 PC2 cg05096415 cg08857872
## 200223270003_R02C01 MCI -0.214185447 0.9124898 1.470293e-02 0.9182527 0.3395280
## 200223270003_R03C01 CN -0.172761185 0.5375751 5.745834e-02 0.5177819 0.8181845
## 200223270003_R06C01 CN -0.003667305 0.5040948 8.372861e-02 0.6288426 0.2970779
## 200223270003_R07C01 Dementia -0.186779607 0.9039029 -1.117250e-02 0.6060271 0.2954090
## 200223270006_R01C01 MCI 0.026814649 0.8961556 1.650735e-05 0.5599588 0.8935876
## 200223270006_R04C01 CN -0.037862929 0.8857597 1.571950e-02 0.5441200 0.8901338
## cg23432430 cg16652920 cg06864789 age.now cg01921484 cg26948066 cg17186592
## 200223270003_R02C01 0.9482702 0.9436000 0.05369415 82.40000 0.90985496 0.4685225 0.9230463
## 200223270003_R03C01 0.9455418 0.9431222 0.46053125 78.60000 0.90931369 0.5026045 0.8593448
## 200223270003_R06C01 0.9418716 0.9457161 0.87513655 80.40000 0.92044873 0.9101976 0.8467599
## 200223270003_R07C01 0.9426559 0.9419785 0.49020327 78.16441 0.91674311 0.9379543 0.4986373
## 200223270006_R01C01 0.9461736 0.9529417 0.47852685 62.90000 0.02943747 0.9120181 0.8978999
## 200223270006_R04C01 0.9508404 0.9492648 0.05423587 80.67796 0.89057041 0.8868608 0.9239750
## cg09584650 cg12279734 cg02981548 cg14710850 PC3 cg07152869
## 200223270003_R02C01 0.08230254 0.6435368 0.1342571 0.8048592 -0.014043316 0.8284151
## 200223270003_R03C01 0.09661586 0.1494651 0.5220037 0.8090950 0.005055871 0.5050630
## 200223270003_R06C01 0.52399749 0.8760759 0.5098965 0.8285902 0.029143653 0.8352490
## 200223270003_R07C01 0.11587211 0.8674214 0.5660985 0.8336457 -0.032302430 0.5194300
## 200223270006_R01C01 0.42115185 0.6454450 0.5678714 0.8500725 0.052947950 0.5025709
## 200223270006_R04C01 0.56043178 0.8660058 0.5079859 0.8207247 -0.008685676 0.8080916
## cg08861434 cg15501526 cg25259265 cg02225060 cg24859648 cg11133939
## 200223270003_R02C01 0.8768306 0.6362531 0.4356646 0.6828159 0.83777536 0.1282694
## 200223270003_R03C01 0.4352647 0.6319253 0.8893591 0.8265195 0.44392797 0.5920898
## 200223270003_R06C01 0.8698813 0.7435100 0.4201700 0.5209552 0.03341185 0.5127706
## 200223270003_R07C01 0.4709249 0.7756577 0.4455517 0.8078889 0.43582347 0.8474176
## 200223270006_R01C01 0.8618532 0.3230777 0.8423337 0.6084903 0.03087161 0.8589133
## 200223270006_R04C01 0.9058965 0.8342695 0.8460736 0.7638781 0.02588024 0.5246557
## cg25879395 cg02621446 cg00247094 cg02494911 cg16771215 cg24861747
## 200223270003_R02C01 0.88130864 0.8731313 0.5399349 0.3049435 0.88389723 0.3540897
## 200223270003_R03C01 0.02603438 0.8095534 0.9315640 0.2416332 0.07196933 0.4309505
## 200223270003_R06C01 0.91060615 0.7511582 0.5177874 0.2520909 0.09949974 0.8071462
## 200223270003_R07C01 0.89205942 0.8773609 0.5377765 0.2457032 0.64234023 0.3347317
## 200223270006_R01C01 0.47886249 0.2046541 0.9109309 0.8045030 0.62679274 0.3544795
## 200223270006_R04C01 0.02145248 0.7963817 0.5266535 0.7489283 0.06970175 0.5997840
## cg01153376 cg04412904 cg20913114 cg01128042 cg10240127 cg14564293
## 200223270003_R02C01 0.4872148 0.05088595 0.36510482 0.9113420 0.9250553 0.52089591
## 200223270003_R03C01 0.9639670 0.07717659 0.80382984 0.5328806 0.9403255 0.04000662
## 200223270003_R06C01 0.2242410 0.08253743 0.03158439 0.5222757 0.9056974 0.04959460
## 200223270003_R07C01 0.5155654 0.06217431 0.81256840 0.5141721 0.9396217 0.03114773
## 200223270006_R01C01 0.9588916 0.11888769 0.81502059 0.9321215 0.9262370 0.51703196
## 200223270006_R04C01 0.9586876 0.08885846 0.90468830 0.5050081 0.9240497 0.51535010
## cg16749614 cg01013522 cg16579946 cg03129555 cg02372404 cg05234269
## 200223270003_R02C01 0.8678741 0.6251168 0.6306315 0.6079616 0.03598249 0.93848584
## 200223270003_R03C01 0.8539348 0.8862821 0.6648766 0.5785498 0.02767285 0.57461229
## 200223270003_R06C01 0.5874127 0.5425308 0.6455081 0.9137818 0.03127855 0.02467208
## 200223270003_R07C01 0.5555391 0.8429862 0.8979650 0.9043041 0.55685785 0.56516794
## 200223270006_R01C01 0.8026346 0.0480531 0.6886498 0.9286357 0.02587736 0.94829529
## 200223270006_R04C01 0.7903978 0.8240222 0.6766907 0.9088564 0.02828648 0.56298286
## cg12146221 cg12228670 cg14924512 cg27452255 cg16715186 cg00616572
## 200223270003_R02C01 0.2049284 0.8632174 0.5303907 0.9001010 0.2742789 0.9335067
## 200223270003_R03C01 0.1814927 0.8496212 0.9160885 0.6593379 0.7946153 0.9214079
## 200223270003_R06C01 0.8619250 0.8738949 0.9088414 0.9012217 0.8124316 0.9113633
## 200223270003_R07C01 0.1238469 0.8362189 0.9081681 0.8898635 0.7773263 0.9160238
## 200223270006_R01C01 0.2021598 0.8079694 0.9111789 0.5779792 0.8334531 0.4861334
## 200223270006_R04C01 0.1383786 0.6966666 0.5331753 0.8809143 0.8039945 0.9067928
## cg05570109 cg00154902 cg14293999 cg17421046 cg15775217 cg09854620
## 200223270003_R02C01 0.3466611 0.5137741 0.2836710 0.9026993 0.5707441 0.5220587
## 200223270003_R03C01 0.5866750 0.8540746 0.9172023 0.9112100 0.9168327 0.8739646
## 200223270003_R06C01 0.4046471 0.8188126 0.9168166 0.8952031 0.6042521 0.8973149
## 200223270003_R07C01 0.6014355 0.4625776 0.9188336 0.9268852 0.9062231 0.8958863
## 200223270006_R01C01 0.5774881 0.4690086 0.1971116 0.1118337 0.9083515 0.9075331
## 200223270006_R04C01 0.8756826 0.4547219 0.9030919 0.4174370 0.6383270 0.9318820
## cg19503462 cg26757229 cg06378561 cg01680303 cg06715136 cg15535896
## 200223270003_R02C01 0.7951675 0.6723726 0.9389306 0.5095174 0.3400192 0.3382952
## 200223270003_R03C01 0.4537684 0.1422661 0.9377503 0.1344941 0.9259109 0.9253926
## 200223270003_R06C01 0.6997359 0.7933794 0.5154019 0.7573869 0.9079807 0.3320191
## 200223270003_R07C01 0.7189778 0.8074830 0.9403569 0.4772204 0.6782105 0.9409104
## 200223270006_R01C01 0.7301755 0.5265692 0.4956816 0.1176263 0.8369052 0.9326027
## 200223270006_R04C01 0.4207207 0.7341953 0.9268832 0.5133033 0.8807568 0.9156401
## cg00322003 cg27341708 cg03084184 cg26219488 cg18339359 cg06697310
## 200223270003_R02C01 0.1759911 0.48846610 0.8162981 0.9336638 0.8824858 0.8454609
## 200223270003_R03C01 0.5702070 0.02613847 0.7877128 0.9134707 0.9040272 0.8653044
## 200223270003_R06C01 0.3077122 0.86893582 0.4546397 0.9261878 0.8552121 0.2405168
## 200223270003_R07C01 0.6104341 0.02642300 0.7812413 0.9217866 0.3073106 0.8479193
## 200223270006_R01C01 0.6147419 0.47573455 0.7818230 0.4929692 0.8973742 0.8206613
## 200223270006_R04C01 0.2293759 0.89411974 0.7725853 0.9431574 0.2292800 0.7839595
## cg10369879 cg10738648 cg06536614 cg26069044 cg20685672 cg03327352
## 200223270003_R02C01 0.9218784 0.44931577 0.5824474 0.92401867 0.67121006 0.8851712
## 200223270003_R03C01 0.3149306 0.49894016 0.5746694 0.94072227 0.79320906 0.8786878
## 200223270003_R06C01 0.9141081 0.05552024 0.5773468 0.93321315 0.66136456 0.3042310
## 200223270003_R07C01 0.9054415 0.03730440 0.5848917 0.56567694 0.80838304 0.8273211
## 200223270006_R01C01 0.2917862 0.54952781 0.5669919 0.94369927 0.08291414 0.8774082
## 200223270006_R04C01 0.9200403 0.59358167 0.5718514 0.02040391 0.84460055 0.8829492
## cg00999469 cg23658987 cg05841700 cg01667144 cg15865722 cg13885788
## 200223270003_R02C01 0.3274080 0.79757644 0.2923544 0.8971484 0.89438595 0.9380618
## 200223270003_R03C01 0.2857719 0.07511718 0.9146488 0.3175389 0.90194372 0.9369476
## 200223270003_R06C01 0.2499229 0.10177571 0.3737990 0.9238364 0.92118977 0.5163017
## 200223270003_R07C01 0.2819622 0.46747992 0.5046468 0.8739442 0.09230759 0.9183376
## 200223270006_R01C01 0.2933539 0.76831297 0.8419031 0.2931961 0.93422668 0.5525542
## 200223270006_R04C01 0.2966623 0.08988532 0.9286652 0.8616530 0.92220002 0.9328289
## cg14527649 cg23161429 cg20370184 cg18821122 cg07523188 cg12534577
## 200223270003_R02C01 0.2678912 0.8956965 0.37710950 0.9291309 0.7509183 0.8585231
## 200223270003_R03C01 0.7954683 0.9099619 0.05737964 0.5901603 0.1524386 0.8493466
## 200223270003_R06C01 0.8350610 0.8833895 0.04740505 0.5779620 0.7127592 0.8395241
## 200223270003_R07C01 0.8428684 0.9134709 0.83572095 0.9251431 0.8464983 0.8511384
## 200223270006_R01C01 0.8231348 0.8738558 0.04056608 0.9217018 0.7847738 0.8804655
## 200223270006_R04C01 0.8022444 0.9104210 0.04038589 0.5412250 0.8231277 0.3029013
## cg02356645 cg03982462 cg04248279 cg13080267 cg27639199 cg08198851
## 200223270003_R02C01 0.5105903 0.8562777 0.8534976 0.78936656 0.67515415 0.6578905
## 200223270003_R03C01 0.5833923 0.6023731 0.8458854 0.78371483 0.67552763 0.6578186
## 200223270003_R06C01 0.5701428 0.8778458 0.8332786 0.09436069 0.06233093 0.1272153
## 200223270003_R07C01 0.5683381 0.8860227 0.3303204 0.09351259 0.05701332 0.8351465
## 200223270006_R01C01 0.5233692 0.8703107 0.5966878 0.45173796 0.05037694 0.8791156
## 200223270006_R04C01 0.9188670 0.8792860 0.8939599 0.49866715 0.08144161 0.1423737
## cg11331837 cg24873924 cg20507276 cg25561557 cg22274273 cg12682323
## 200223270003_R02C01 0.03692842 0.3060635 0.12238910 0.76736369 0.4209386 0.9397956
## 200223270003_R03C01 0.57150125 0.8640985 0.38721972 0.03851635 0.4246379 0.9003940
## 200223270003_R06C01 0.03182862 0.8259149 0.47978438 0.47259480 0.4196796 0.9157877
## 200223270003_R07C01 0.03832164 0.8333940 0.02261996 0.43364249 0.4164100 0.9048877
## 200223270006_R01C01 0.93008298 0.8761177 0.37465798 0.46211439 0.7951105 0.1065347
## 200223270006_R04C01 0.54004452 0.8585363 0.03570795 0.44651530 0.0229810 0.8836232
## cg17738613 cg21209485 cg03088219 cg03660162 cg10750306 cg27272246
## 200223270003_R02C01 0.6879612 0.8865053 0.844002862 0.8691767 0.04919915 0.8615873
## 200223270003_R03C01 0.6582258 0.8714878 0.007435243 0.5160770 0.55160081 0.8705287
## 200223270003_R06C01 0.1022257 0.2292550 0.120155222 0.9026304 0.54694332 0.8103777
## 200223270003_R07C01 0.8960156 0.2351526 0.826554308 0.5305691 0.59824543 0.0310881
## 200223270006_R01C01 0.8850702 0.8882046 0.066294915 0.9257451 0.53158639 0.7686536
## 200223270006_R04C01 0.8481916 0.2292483 0.574738383 0.8935772 0.05646838 0.4403542
## cg11438323 cg12738248 cg21854924 cg20139683 cg16178271 cg07028768
## 200223270003_R02C01 0.4863471 0.85430866 0.8729132 0.8717075 0.6445416 0.4496851
## 200223270003_R03C01 0.8984559 0.88010292 0.7162342 0.9059433 0.6178075 0.8536078
## 200223270003_R06C01 0.8722772 0.51121855 0.7520990 0.8962554 0.6641952 0.8356936
## 200223270003_R07C01 0.5026756 0.09131476 0.8641284 0.9218012 0.7148058 0.4245893
## 200223270006_R01C01 0.8809646 0.91529345 0.6498895 0.1708472 0.6138954 0.8835151
## 200223270006_R04C01 0.8717937 0.91911405 0.5943113 0.1067122 0.9414188 0.4514661
## cg26474732 cg00675157 cg23916408 cg05321907 cg17429539 cg06950937
## 200223270003_R02C01 0.7843252 0.9188438 0.1942275 0.2880477 0.7860900 0.8910968
## 200223270003_R03C01 0.8184088 0.9242325 0.9154993 0.1782629 0.7100923 0.2889345
## 200223270003_R06C01 0.7358417 0.9254708 0.8886255 0.8427929 0.7660838 0.9143801
## 200223270003_R07C01 0.7509296 0.5447244 0.8872447 0.8320504 0.6984969 0.8891079
## 200223270006_R01C01 0.8294938 0.5173554 0.2219945 0.2422218 0.6508597 0.8868617
## 200223270006_R04C01 0.8033167 0.9247232 0.1520624 0.2429551 0.2828452 0.9093273
## cg14240646 cg27086157 cg25758034 cg11247378 cg19377607 cg07480176
## 200223270003_R02C01 0.5391334 0.9224112 0.6114028 0.1591185 0.05377464 0.5171664
## 200223270003_R03C01 0.2538363 0.9219304 0.6649219 0.7874849 0.90570746 0.3760452
## 200223270003_R06C01 0.1864902 0.3224986 0.2393844 0.4807942 0.06636174 0.6998389
## 200223270003_R07C01 0.6402007 0.3455486 0.7071501 0.4537348 0.68788639 0.2189042
## 200223270006_R01C01 0.7696079 0.8988962 0.2301078 0.1537079 0.06338988 0.5570021
## 200223270006_R04C01 0.1490028 0.9159217 0.6891513 0.1686356 0.91551446 0.4501196
## cg27577781 cg11187460 cg03071582 cg12284872 cg02932958 cg12012426
## 200223270003_R02C01 0.8143535 0.03672179 0.9187811 0.8008333 0.7901008 0.9165048
## 200223270003_R03C01 0.8113185 0.92516409 0.5844421 0.7414569 0.4210489 0.9434768
## 200223270003_R06C01 0.8144274 0.03109553 0.6245558 0.7725267 0.3825995 0.9220044
## 200223270003_R07C01 0.7970617 0.53283119 0.9283683 0.7573369 0.7617081 0.9241284
## 200223270006_R01C01 0.8640044 0.54038146 0.5715416 0.7201607 0.8431126 0.9327143
## 200223270006_R04C01 0.8840237 0.91096169 0.6534650 0.8021446 0.7610084 0.9271167
## cg06118351 cg00696044 cg25436480 cg02320265 cg11227702 cg18819889
## 200223270003_R02C01 0.36339400 0.55608424 0.84251599 0.8853213 0.86486075 0.9156157
## 200223270003_R03C01 0.47148604 0.07552381 0.49940321 0.4686314 0.49184121 0.9004455
## 200223270003_R06C01 0.86559618 0.79270858 0.34943119 0.4838749 0.02543724 0.9054439
## 200223270003_R07C01 0.83494303 0.03548419 0.85244913 0.8986848 0.45150971 0.9089935
## 200223270006_R01C01 0.02632111 0.10714386 0.44545117 0.8987560 0.89086877 0.9065397
## 200223270006_R04C01 0.83329300 0.18420803 0.02575036 0.4768520 0.87675947 0.9242767
## cg06112204 cg19512141 cg24506579 cg00272795 cg21697769 cg12776173
## 200223270003_R02C01 0.5251592 0.8209161 0.5244337 0.46365138 0.8946108 0.10388038
## 200223270003_R03C01 0.8773488 0.7903543 0.5794845 0.82839260 0.2822953 0.87306345
## 200223270003_R06C01 0.8867975 0.8404684 0.9427785 0.07231279 0.8698740 0.70094907
## 200223270003_R07C01 0.5613799 0.2202759 0.9323844 0.78303831 0.9134887 0.11367159
## 200223270006_R01C01 0.9184122 0.8059589 0.9185355 0.78219952 0.2683820 0.09458405
## 200223270006_R04C01 0.9152514 0.7020247 0.4332642 0.44408249 0.2765740 0.86532175
## cg07138269 cg17906851 cg08779649 cg10985055 cg08584917 cg04664583
## 200223270003_R02C01 0.5002290 0.9488392 0.44449401 0.8518169 0.5663205 0.5572814
## 200223270003_R03C01 0.9426707 0.9529718 0.45076825 0.8631895 0.9019732 0.5881190
## 200223270003_R06C01 0.5057781 0.6462151 0.04810217 0.5456633 0.9187789 0.9352717
## 200223270003_R07C01 0.9400527 0.9553497 0.42715969 0.8825100 0.6007449 0.9350230
## 200223270006_R01C01 0.9321602 0.6222117 0.89313476 0.8841690 0.9069098 0.9424588
## 200223270006_R04C01 0.9333501 0.6441202 0.59523771 0.8407797 0.9263584 0.9379537
## cg01933473 cg00689685 cg14307563 cg12784167 cg24851651 cg15633912
## 200223270003_R02C01 0.2589014 0.7019389 0.1855966 0.81503498 0.03674702 0.1605530
## 200223270003_R03C01 0.6726133 0.8634268 0.8916957 0.02811410 0.05358297 0.9333421
## 200223270003_R06C01 0.2642560 0.6378795 0.8750052 0.03073269 0.05968923 0.8737362
## 200223270003_R07C01 0.1978068 0.8624541 0.8975663 0.84775699 0.60864179 0.9137334
## 200223270006_R01C01 0.7599441 0.6361891 0.8762842 0.83825789 0.08825834 0.9169706
## 200223270006_R04C01 0.7405661 0.6356260 0.9168614 0.45475291 0.91932068 0.8890004
## cg12466610 cg16788319 cg20678988 cg01413796 cg01549082
## 200223270003_R02C01 0.05767659 0.9379870 0.8438718 0.1345128 0.2924138
## 200223270003_R03C01 0.59131778 0.8913429 0.8548886 0.2830672 0.7065693
## 200223270003_R06C01 0.06939623 0.8680680 0.7786685 0.8194681 0.2895440
## 200223270003_R07C01 0.04527733 0.8811444 0.8260541 0.9007710 0.6422955
## 200223270006_R01C01 0.05212904 0.3123481 0.3295384 0.2603027 0.8471236
## 200223270006_R04C01 0.05104033 0.2995627 0.8541667 0.9207672 0.6949888
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Mean_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Mean_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 46 7 14
## Dementia 3 10 4
## MCI 17 11 81
##
## Overall Statistics
##
## Accuracy : 0.7098
## 95% CI : (0.6403, 0.7728)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.018e-08
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 0.1607
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6970 0.35714 0.8182
## Specificity 0.8346 0.95758 0.7021
## Pos Pred Value 0.6866 0.58824 0.7431
## Neg Pred Value 0.8413 0.89773 0.7857
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2383 0.05181 0.4197
## Detection Prevalence 0.3472 0.08808 0.5648
## Balanced Accuracy 0.7658 0.65736 0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Mean_LRM1_Accuracy <- cm_FeatEval_Mean_LRM1$overall["Accuracy"]
cm_FeatEval_Mean_LRM1_Kappa <- cm_FeatEval_Mean_LRM1$overall["Kappa"]
print(cm_FeatEval_Mean_LRM1_Accuracy)
## Accuracy
## 0.7098446
print(cm_FeatEval_Mean_LRM1_Kappa)
## Kappa
## 0.4987013
print(model_LRM1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001810831 0.6350263 0.3962356
## 0.10 0.0018108309 0.6460636 0.4102125
## 0.10 0.0181083090 0.6548792 0.4144240
## 0.55 0.0001810831 0.6263550 0.3765308
## 0.55 0.0018108309 0.6505792 0.4121576
## 0.55 0.0181083090 0.6483336 0.3870111
## 1.00 0.0001810831 0.6065010 0.3457739
## 1.00 0.0018108309 0.6394930 0.3907984
## 1.00 0.0181083090 0.5867925 0.2663062
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Mean_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.96043956043956"
print(FeatEval_Mean_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.6326693
FeatEval_Mean_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Mean_mean_accuracy_cv_LRM1)
## [1] 0.6326693
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG ==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8312
## The AUC value for class Dementia is: 0.8311688
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8189
## The AUC value for class MCI is: 0.818934
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_LRM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8329421
print(FeatEval_Mean_LRM1_AUC)
## [1] 0.8329421
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 90.424 1.000e+02 0.000
## PC2 46.616 7.877e+01 0.000
## PC3 6.073 0.000e+00 68.217
## cg00962106 63.057 1.183e+01 36.936
## cg02225060 23.027 1.263e+01 51.151
## cg14710850 49.621 8.391e+00 25.398
## cg27452255 49.050 1.786e+01 11.826
## cg02981548 26.232 5.626e+00 49.013
## cg08861434 48.681 0.000e+00 42.742
## cg19503462 25.906 4.812e+01 5.791
## cg07152869 27.973 4.673e+01 1.360
## cg16749614 11.547 1.797e+01 45.945
## cg05096415 1.413 4.492e+01 28.934
## cg23432430 44.233 3.509e+00 25.256
## cg17186592 3.088 4.200e+01 26.683
## cg00247094 15.875 4.167e+01 10.436
## cg09584650 41.421 6.534e+00 18.532
## cg11133939 24.203 1.687e-03 40.480
## cg16715186 39.188 7.688e+00 17.049
## cg03129555 12.446 3.860e+01 8.425
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 90.4236664 1.000000e+02 0.0000000 PC1 100.0000000
## 2 46.6158230 7.876514e+01 0.0000000 PC2 78.7651390
## 3 6.0732385 0.000000e+00 68.2173286 PC3 68.2173286
## 4 63.0567895 1.183052e+01 36.9364886 cg00962106 63.0567895
## 5 23.0265832 1.262802e+01 51.1506685 cg02225060 51.1506685
## 6 49.6209826 8.390836e+00 25.3977905 cg14710850 49.6209826
## 7 49.0496838 1.785809e+01 11.8258108 cg27452255 49.0496838
## 8 26.2323147 5.625925e+00 49.0125309 cg02981548 49.0125309
## 9 48.6806848 0.000000e+00 42.7421514 cg08861434 48.6806848
## 10 25.9055990 4.811555e+01 5.7906582 cg19503462 48.1155464
## 11 27.9726027 4.672801e+01 1.3602948 cg07152869 46.7280077
## 12 11.5469865 1.796701e+01 45.9447524 cg16749614 45.9447524
## 13 1.4125525 4.491749e+01 28.9342186 cg05096415 44.9174899
## 14 44.2328180 3.508617e+00 25.2561347 cg23432430 44.2328180
## 15 3.0875872 4.199779e+01 26.6830942 cg17186592 41.9977939
## 16 15.8745094 4.166997e+01 10.4359896 cg00247094 41.6699659
## 17 41.4211230 6.534278e+00 18.5319272 cg09584650 41.4211230
## 18 24.2034820 1.687251e-03 40.4800617 cg11133939 40.4800617
## 19 39.1879982 7.688370e+00 17.0487674 cg16715186 39.1879982
## 20 12.4459376 3.860234e+01 8.4248631 cg03129555 38.6023357
## 21 3.1921180 2.009614e+01 38.4787320 cg08857872 38.4787320
## 22 12.1315784 3.682837e+01 11.1247121 cg06864789 36.8283695
## 23 0.0000000 3.530183e+01 26.7385538 cg14924512 35.3018323
## 24 7.2101094 1.187214e+01 34.9148750 cg16652920 34.9148750
## 25 19.1219402 3.459956e+01 0.0000000 cg03084184 34.5995575
## 26 3.6609079 1.335948e+01 34.1622823 cg26219488 34.1622823
## 27 13.4822639 3.380877e+01 6.0644062 cg20913114 33.8087688
## 28 7.1343871 3.346938e+01 11.8191158 cg06378561 33.4693793
## 29 33.3214329 1.548017e+01 2.1007087 cg26948066 33.3214329
## 30 0.5721452 3.328675e+01 17.4675204 cg25259265 33.2867495
## 31 33.2597536 0.000000e+00 21.5563401 cg06536614 33.2597536
## 32 1.6480480 3.232505e+01 17.2508040 cg24859648 32.3250481
## 33 12.7630119 3.077963e+01 2.2041139 cg12279734 30.7796293
## 34 30.6939618 1.116751e+01 2.4904135 cg03982462 30.6939618
## 35 1.2191910 3.061511e+01 16.6088463 cg05841700 30.6151075
## 36 29.8316998 7.646344e+00 7.7265359 cg11227702 29.8316998
## 37 25.3622898 0.000000e+00 29.0155964 cg12146221 29.0155964
## 38 9.6427110 8.951059e+00 28.9324214 cg02621446 28.9324214
## 39 0.0000000 2.259323e+01 28.8392975 cg00616572 28.8392975
## 40 28.4402938 8.977561e+00 6.5457544 cg15535896 28.4402938
## 41 25.4671036 0.000000e+00 28.2002270 cg02372404 28.2002270
## 42 5.0584162 2.777780e+01 8.1363657 cg09854620 27.7777988
## 43 27.6105358 0.000000e+00 15.8569776 cg04248279 27.6105358
## 44 3.9947457 7.707938e+00 27.5383597 cg20678988 27.5383597
## 45 0.0000000 2.752900e+01 13.8294115 cg24861747 27.5290027
## 46 27.4710197 1.566117e+01 0.0000000 cg10240127 27.4710197
## 47 7.7716675 7.237123e+00 27.2251410 cg16771215 27.2251410
## 48 0.6477691 2.697150e+01 14.6478759 cg01667144 26.9715010
## 49 26.9373979 8.943869e+00 2.8090450 cg13080267 26.9373979
## 50 0.0000000 2.615370e+01 26.5923694 cg02494911 26.5923694
## 51 9.3817536 2.645606e+01 5.1251747 cg10750306 26.4560604
## 52 25.4583653 1.204429e+00 11.2684574 cg11438323 25.4583653
## 53 4.8711575 4.022583e+00 25.4129088 cg06715136 25.4129088
## 54 25.1290464 0.000000e+00 15.3740479 cg04412904 25.1290464
## 55 4.7625575 2.483766e+01 5.3951689 cg12738248 24.8376618
## 56 24.4006278 0.000000e+00 18.6839449 cg03071582 24.4006278
## 57 0.0000000 2.429556e+01 15.8184784 cg05570109 24.2955559
## 58 24.2246662 2.027283e+01 0.0000000 cg15775217 24.2246662
## 59 0.0000000 1.993766e+01 24.1861184 cg24873924 24.1861184
## 60 7.5582296 4.154304e+00 24.1289924 cg17738613 24.1289924
## 61 23.8214092 0.000000e+00 20.8147297 cg01921484 23.8214092
## 62 0.0000000 1.632160e+01 23.6854396 cg10369879 23.6854396
## 63 0.0000000 1.840061e+01 23.6420901 cg27341708 23.6420901
## 64 0.0000000 2.355222e+01 21.4288853 cg12534577 23.5522196
## 65 0.0000000 2.343045e+01 17.8269147 cg18821122 23.4304500
## 66 4.6170471 6.921189e+00 23.3527287 cg12682323 23.3527287
## 67 23.3209910 0.000000e+00 14.1833195 cg05234269 23.3209910
## 68 23.0307834 0.000000e+00 22.7938958 cg20685672 23.0307834
## 69 20.3680497 0.000000e+00 22.8562527 cg12228670 22.8562527
## 70 22.7069964 3.660633e+00 8.3346151 cg11331837 22.7069964
## 71 0.0000000 2.268753e+01 20.8599811 cg01680303 22.6875341
## 72 22.4129176 1.160843e+00 10.2276178 cg17421046 22.4129176
## 73 22.2738923 8.042670e+00 2.2622928 cg03088219 22.2738923
## 74 22.2627889 1.928880e+01 0.0000000 cg00322003 22.2627889
## 75 22.2407874 1.530789e+01 0.0000000 cg02356645 22.2407874
## 76 5.8928437 2.207499e+01 1.2617305 cg01013522 22.0749918
## 77 12.6196526 0.000000e+00 21.8196303 cg00272795 21.8196303
## 78 21.6367031 0.000000e+00 14.5418861 cg25758034 21.6367031
## 79 4.7726068 2.162589e+01 1.1857577 cg26474732 21.6258905
## 80 0.0000000 2.126494e+01 17.6339938 cg16579946 21.2649390
## 81 9.6070487 2.121677e+01 0.0000000 cg07523188 21.2167720
## 82 21.2090861 4.531801e+00 5.6485933 cg11187460 21.2090861
## 83 0.0000000 1.703369e+01 20.8087502 cg14527649 20.8087502
## 84 2.7288778 4.858758e+00 20.5395653 cg20370184 20.5395653
## 85 20.5238610 0.000000e+00 13.7146333 cg17429539 20.5238610
## 86 0.0000000 2.027184e+01 10.0202802 cg20507276 20.2718432
## 87 1.1829922 6.819762e+00 20.1949298 cg13885788 20.1949298
## 88 0.0000000 1.557801e+01 20.0711568 cg16178271 20.0711568
## 89 5.5958884 1.533155e+00 19.9939644 cg10738648 19.9939644
## 90 5.1484910 1.991679e+01 2.7511644 cg26069044 19.9167949
## 91 3.1995623 4.951416e+00 19.7913728 cg25879395 19.7913728
## 92 19.6367721 0.000000e+00 12.1257134 cg06112204 19.6367721
## 93 3.2337436 1.923270e+01 1.2688054 cg23161429 19.2327006
## 94 19.0290833 0.000000e+00 8.8811450 cg25436480 19.0290833
## 95 18.8963290 1.895416e+01 0.0000000 cg26757229 18.9541606
## 96 18.8489892 8.146546e+00 0.0000000 cg02932958 18.8489892
## 97 6.3385640 1.862396e+01 0.9542925 cg18339359 18.6239621
## 98 18.5833313 1.513099e+00 1.8899782 cg06950937 18.5833313
## 99 12.0414352 1.857722e+01 0.0000000 cg23916408 18.5772240
## 100 1.5261549 3.184899e+00 18.1654164 cg12784167 18.1654164
## 101 11.9154282 0.000000e+00 18.1155156 cg07480176 18.1155156
## 102 0.0000000 5.496876e+00 17.6957094 cg15865722 17.6957094
## 103 17.6745178 0.000000e+00 13.0402417 cg27577781 17.6745178
## 104 17.1561047 2.943098e+00 2.5244160 cg05321907 17.1561047
## 105 16.8696278 0.000000e+00 7.5600874 cg03660162 16.8696278
## 106 16.7547601 0.000000e+00 9.9115899 cg07138269 16.7547601
## 107 16.7359257 9.081285e-04 5.4548067 cg20139683 16.7359257
## 108 1.5127234 1.661837e+01 3.6050266 cg12284872 16.6183749
## 109 16.5320336 0.000000e+00 15.3309720 cg03327352 16.5320336
## 110 0.0000000 1.652355e+01 12.9102072 cg23658987 16.5235495
## 111 0.0000000 1.474794e+01 16.1731669 cg21854924 16.1731669
## 112 15.7781397 0.000000e+00 6.8410564 cg21697769 15.7781397
## 113 15.6679755 5.754754e+00 0.0000000 cg19512141 15.6679755
## 114 10.3149355 0.000000e+00 15.4737089 cg08198851 15.4737089
## 115 0.4260012 1.508768e+01 0.8270265 cg00675157 15.0876767
## 116 0.0000000 5.691150e+00 15.0114537 cg01153376 15.0114537
## 117 1.8023617 1.495677e+01 0.7652334 cg01933473 14.9567667
## 118 14.9041545 0.000000e+00 4.5865304 cg12776173 14.9041545
## 119 0.0000000 1.067475e+01 14.7131793 cg14564293 14.7131793
## 120 12.4078661 0.000000e+00 14.5714652 cg24851651 14.5714652
## 121 0.0000000 1.452429e+01 2.2532934 cg22274273 14.5242914
## 122 12.7839527 1.451759e+01 0.0000000 cg25561557 14.5175853
## 123 13.7937627 1.439434e+01 0.0000000 cg21209485 14.3943424
## 124 3.9002055 1.430129e+01 0.0000000 cg10985055 14.3012935
## 125 8.0836178 0.000000e+00 14.2414682 cg14293999 14.2414682
## 126 0.0000000 6.083721e+00 13.9742620 cg18819889 13.9742620
## 127 7.9121604 1.390587e+01 0.0000000 cg24506579 13.9058683
## 128 10.4879315 0.000000e+00 13.8167304 cg19377607 13.8167304
## 129 2.6273452 1.361436e+01 0.0000000 cg06697310 13.6143633
## 130 13.5716123 0.000000e+00 10.1626215 cg00696044 13.5716123
## 131 0.0000000 0.000000e+00 13.1070671 cg01549082 13.1070671
## 132 0.0000000 6.885929e+00 13.0744631 cg01128042 13.0744631
## 133 0.2711506 1.248014e+01 1.1549034 cg00999469 12.4801390
## 134 0.0000000 1.079026e+01 12.3791517 cg06118351 12.3791517
## 135 0.0000000 1.123953e+01 11.7870153 cg12012426 11.7870153
## 136 11.7355779 9.453096e+00 0.0000000 cg08584917 11.7355779
## 137 11.6965026 0.000000e+00 11.1694390 cg27272246 11.6965026
## 138 0.0000000 1.168019e+01 2.2462032 cg15633912 11.6801939
## 139 11.3472005 1.977304e+00 0.0000000 cg17906851 11.3472005
## 140 1.1947138 1.133359e+01 0.0000000 cg16788319 11.3335935
## 141 8.9803747 0.000000e+00 11.2948800 cg07028768 11.2948800
## 142 0.0000000 3.117611e+00 10.7425453 cg27086157 10.7425453
## 143 1.8005341 9.613392e+00 0.0000000 cg14240646 9.6133916
## 144 0.0000000 9.464076e+00 9.1968206 cg00154902 9.4640757
## 145 6.6622696 0.000000e+00 9.1080687 cg14307563 9.1080687
## 146 0.0000000 8.519950e+00 0.0000000 cg02320265 8.5199503
## 147 8.2042959 0.000000e+00 7.0539563 cg08779649 8.2042959
## 148 7.6562298 0.000000e+00 7.9627233 cg04664583 7.9627233
## 149 0.0000000 0.000000e+00 6.6052912 cg12466610 6.6052912
## 150 6.2519012 3.701166e+00 0.0000000 cg27639199 6.2519012
## 151 0.0000000 0.000000e+00 5.8434575 cg15501526 5.8434575
## 152 0.0000000 4.839811e+00 3.6619673 cg00689685 4.8398115
## 153 2.7986803 0.000000e+00 0.0787605 cg01413796 2.7986803
## 154 0.0000000 0.000000e+00 2.1277610 cg11247378 2.1277610
## 155 0.5214097 0.000000e+00 0.6361066 age.now 0.6361066
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 90.423666 1.000000e+02 0.000000 PC1 100.00000
## 2 46.615823 7.876514e+01 0.000000 PC2 78.76514
## 3 6.073238 0.000000e+00 68.217329 PC3 68.21733
## 4 63.056789 1.183052e+01 36.936489 cg00962106 63.05679
## 5 23.026583 1.262802e+01 51.150669 cg02225060 51.15067
## 6 49.620983 8.390836e+00 25.397790 cg14710850 49.62098
## 7 49.049684 1.785809e+01 11.825811 cg27452255 49.04968
## 8 26.232315 5.625925e+00 49.012531 cg02981548 49.01253
## 9 48.680685 0.000000e+00 42.742151 cg08861434 48.68068
## 10 25.905599 4.811555e+01 5.790658 cg19503462 48.11555
## 11 27.972603 4.672801e+01 1.360295 cg07152869 46.72801
## 12 11.546987 1.796701e+01 45.944752 cg16749614 45.94475
## 13 1.412552 4.491749e+01 28.934219 cg05096415 44.91749
## 14 44.232818 3.508617e+00 25.256135 cg23432430 44.23282
## 15 3.087587 4.199779e+01 26.683094 cg17186592 41.99779
## 16 15.874509 4.166997e+01 10.435990 cg00247094 41.66997
## 17 41.421123 6.534278e+00 18.531927 cg09584650 41.42112
## 18 24.203482 1.687251e-03 40.480062 cg11133939 40.48006
## 19 39.187998 7.688370e+00 17.048767 cg16715186 39.18800
## 20 12.445938 3.860234e+01 8.424863 cg03129555 38.60234
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "PC3" "cg00962106" "cg02225060" "cg14710850" "cg27452255"
## [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
table(df_LRM1$DX)
##
## CN Dementia MCI
## 221 94 333
prop.table(table(df_LRM1$DX))
##
## CN Dementia MCI
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
##
## CN Dementia MCI
## 155 66 234
prop.table(table(trainData$DX))
##
## CN Dementia MCI
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 3.542553
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 3.545455Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 132.4, df = 2, p-value < 2.2e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 93.156, df = 2, p-value < 2.2e-16library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
# Extract the new balanced dataset
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN Dementia MCI
## 155 132 234
dim(balanced_data_LGR_1)
## [1] 521 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 6 15
## Dementia 4 11 6
## MCI 17 11 78
##
## Overall Statistics
##
## Accuracy : 0.6943
## 95% CI : (0.6241, 0.7584)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.356e-07
##
## Kappa : 0.4779
##
## Mcnemar's Test P-Value : 0.5733
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.39286 0.7879
## Specificity 0.8346 0.93939 0.7021
## Pos Pred Value 0.6818 0.52381 0.7358
## Neg Pred Value 0.8346 0.90116 0.7586
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.05699 0.4041
## Detection Prevalence 0.3420 0.10881 0.5492
## Balanced Accuracy 0.7582 0.66613 0.7450
print(model_LRM2)
## glmnet
##
## 521 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 416, 417, 417, 417, 417
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.000186946 0.7083883 0.5523130
## 0.10 0.001869460 0.7121978 0.5563269
## 0.10 0.018694597 0.7180220 0.5649649
## 0.55 0.000186946 0.6987912 0.5369622
## 0.55 0.001869460 0.7102930 0.5525186
## 0.55 0.018694597 0.6872894 0.5142517
## 1.00 0.000186946 0.6834432 0.5136505
## 1.00 0.001869460 0.7026007 0.5416133
## 1.00 0.018694597 0.6468864 0.4489232
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6964347
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 80.669 100.000 0.000
## PC2 38.820 80.718 0.000
## cg00962106 56.188 9.092 33.490
## PC3 7.545 0.000 55.850
## cg19503462 26.318 48.653 6.549
## cg27452255 47.894 21.175 8.084
## cg07152869 27.958 45.984 1.304
## cg05096415 3.341 45.589 28.316
## cg02225060 18.278 12.770 45.585
## cg14710850 45.321 8.650 21.700
## cg02981548 23.093 5.917 45.292
## cg08861434 44.860 0.000 36.593
## cg03129555 14.448 42.011 10.562
## cg23432430 41.985 6.884 20.286
## cg16749614 8.921 17.012 41.732
## cg17186592 3.593 40.123 25.160
## cg14924512 1.856 38.979 23.218
## cg09584650 38.236 7.583 15.073
## cg06864789 13.555 38.080 11.895
## cg03084184 19.825 37.842 3.062
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 80.669259842 100.00000000 0.000000000 PC1 100.0000000
## 2 38.819826123 80.71816175 0.000000000 PC2 80.7181617
## 3 56.188431205 9.09164715 33.489593631 cg00962106 56.1884312
## 4 7.544943839 0.00000000 55.849961855 PC3 55.8499619
## 5 26.318196432 48.65339365 6.548593830 cg19503462 48.6533937
## 6 47.893542555 21.17486031 8.084363477 cg27452255 47.8935426
## 7 27.958187019 45.98417901 1.303750081 cg07152869 45.9841790
## 8 3.341262898 45.58888440 28.316497949 cg05096415 45.5888844
## 9 18.278052161 12.77036198 45.585042685 cg02225060 45.5850427
## 10 45.320598074 8.65004585 21.700387368 cg14710850 45.3205981
## 11 23.092639735 5.91677683 45.292211019 cg02981548 45.2922110
## 12 44.859679163 0.00000000 36.593453540 cg08861434 44.8596792
## 13 14.448058636 42.01074159 10.561922277 cg03129555 42.0107416
## 14 41.985454200 6.88416969 20.286475569 cg23432430 41.9854542
## 15 8.920752156 17.01208508 41.732120444 cg16749614 41.7321204
## 16 3.592550338 40.12263291 25.159849272 cg17186592 40.1226329
## 17 1.855731768 38.97859755 23.218315159 cg14924512 38.9785975
## 18 38.236309354 7.58283248 15.073046350 cg09584650 38.2363094
## 19 13.554946716 38.08024109 11.894853106 cg06864789 38.0802411
## 20 19.824747398 37.84166204 3.061516033 cg03084184 37.8416620
## 21 21.488967819 0.51831348 37.518320557 cg11133939 37.5183206
## 22 13.594568831 37.19225285 9.121541523 cg00247094 37.1922529
## 23 0.537660714 20.67180038 35.708582637 cg08857872 35.7085826
## 24 35.474946045 7.95611218 14.040046921 cg16715186 35.4749460
## 25 4.940502697 35.05176837 17.442552868 cg24859648 35.0517684
## 26 14.088329742 34.55096035 5.434257605 cg12279734 34.5509604
## 27 1.732301196 34.09861514 18.438268058 cg25259265 34.0986151
## 28 8.421708924 34.05470276 11.639240231 cg06378561 34.0547028
## 29 2.324010869 13.34764031 31.973137867 cg26219488 31.9731379
## 30 12.467038988 31.58764570 5.780707189 cg20913114 31.5876457
## 31 5.484732273 11.24243460 31.368588674 cg16652920 31.3685887
## 32 1.408899970 30.96741176 17.374727709 cg05841700 30.9674118
## 33 29.670147976 14.07125259 0.805720137 cg26948066 29.6701480
## 34 28.731758811 12.27587513 0.036521290 cg03982462 28.7317588
## 35 28.243427299 8.07844067 6.650568053 cg11227702 28.2434273
## 36 6.457454462 28.03894248 8.127382196 cg09854620 28.0389425
## 37 27.473232929 0.00000000 21.551407733 cg06536614 27.4732329
## 38 7.543918226 9.69276964 27.091698371 cg02621446 27.0916984
## 39 0.000000000 26.98695797 24.136371725 cg02494911 26.9869580
## 40 20.449791085 0.00000000 26.626226562 cg12146221 26.6262266
## 41 0.000000000 25.79037246 26.599235660 cg00616572 26.5992357
## 42 9.535885104 26.42260873 5.646629735 cg10750306 26.4226087
## 43 26.161114130 7.86787376 6.044388665 cg15535896 26.1611141
## 44 1.140414878 25.92724802 13.643048905 cg01667144 25.9272480
## 45 0.000000000 25.63479737 13.465613434 cg24861747 25.6347974
## 46 25.544180258 15.09335718 0.000000000 cg10240127 25.5441803
## 47 24.123077324 0.00000000 25.104931027 cg02372404 25.1049310
## 48 1.108301582 8.18252039 25.042404517 cg06715136 25.0424045
## 49 24.823638654 0.00000000 16.133980825 cg20685672 24.8236387
## 50 0.000000000 24.76847756 14.644120460 cg05570109 24.7684776
## 51 24.742424464 0.00000000 13.437058583 cg04248279 24.7424245
## 52 4.019375709 5.52672503 24.335785443 cg20678988 24.3357854
## 53 0.000000000 24.19229754 18.406568128 cg12534577 24.1922975
## 54 0.000000000 24.13290147 15.849791626 cg16579946 24.1329015
## 55 4.819147175 24.10976268 5.710025426 cg12738248 24.1097627
## 56 6.534246461 5.92804640 24.066359926 cg16771215 24.0663599
## 57 24.001214719 10.16137182 0.028438699 cg13080267 24.0012147
## 58 5.506260122 5.66762058 23.059746909 cg17738613 23.0597469
## 59 22.316420364 6.53244240 5.660162074 cg11331837 22.3164204
## 60 0.000000000 22.27724618 17.226369352 cg01680303 22.2772462
## 61 22.209400690 0.00000000 13.206406219 cg04412904 22.2094007
## 62 0.000000000 22.07433613 14.947652373 cg18821122 22.0743361
## 63 3.420914645 7.32110136 22.052709331 cg12682323 22.0527093
## 64 22.037157524 16.26770399 0.000000000 cg02356645 22.0371575
## 65 0.000000000 20.82172312 22.015679299 cg24873924 22.0156793
## 66 0.000000000 15.83394476 22.004093274 cg10369879 22.0040933
## 67 6.478329203 21.71628843 0.933365662 cg01013522 21.7162884
## 68 16.474577405 0.00000000 21.596576172 cg12228670 21.5965762
## 69 7.510857942 21.11648791 0.000000000 cg07523188 21.1164879
## 70 21.107902395 18.07932642 0.000000000 cg15775217 21.1079024
## 71 20.979670490 0.00000000 16.905430808 cg03071582 20.9796705
## 72 20.954215725 0.00000000 12.112608571 cg05234269 20.9542157
## 73 0.000000000 20.89247737 7.906785371 cg20507276 20.8924774
## 74 0.000000000 19.10583911 20.819551819 cg27341708 20.8195518
## 75 13.168400188 20.44807173 0.000000000 cg25561557 20.4480717
## 76 20.438204928 8.86601949 0.351354534 cg03088219 20.4382049
## 77 20.385646876 0.00000000 19.552513738 cg01921484 20.3856469
## 78 4.715214588 20.18112982 4.193800260 cg26069044 20.1811298
## 79 20.106732907 0.00000000 7.568483801 cg06112204 20.1067329
## 80 20.068564990 0.00000000 10.296917569 cg25758034 20.0685650
## 81 20.064020309 0.22221328 9.404859744 cg17421046 20.0640203
## 82 19.729246407 0.00000000 12.793223406 cg11438323 19.7292464
## 83 19.720993502 0.00000000 9.891621910 cg17429539 19.7209935
## 84 19.520193965 14.87193029 0.000000000 cg00322003 19.5201940
## 85 19.320209563 4.15170201 4.747309123 cg11187460 19.3202096
## 86 2.514891780 5.41107283 18.965540286 cg25879395 18.9655403
## 87 4.051251003 18.83920170 0.228544679 cg26474732 18.8392017
## 88 2.894319944 18.78261721 2.425510843 cg23161429 18.7826172
## 89 1.682510941 4.78560234 18.689792363 cg20370184 18.6897924
## 90 18.638351920 0.02063057 6.337509719 cg25436480 18.6383519
## 91 0.009426087 7.64134319 18.621465414 cg13885788 18.6214654
## 92 11.441363910 18.24527959 0.000000000 cg23916408 18.2452796
## 93 0.000000000 16.67120442 18.160627205 cg14527649 18.1606272
## 94 5.007978338 1.01499164 18.053255407 cg10738648 18.0532554
## 95 0.000000000 17.96109645 12.787393058 cg23658987 17.9610965
## 96 5.985911703 17.93935983 1.285450739 cg18339359 17.9393598
## 97 10.254358605 0.00000000 17.833226289 cg07480176 17.8332263
## 98 2.976699166 17.78268135 4.061910218 cg12284872 17.7826814
## 99 16.803580593 17.77605719 0.000000000 cg26757229 17.7760572
## 100 8.049371238 17.47305028 0.000000000 cg24506579 17.4730503
## 101 17.444256746 8.51373947 0.000000000 cg02932958 17.4442567
## 102 13.323139031 0.00000000 17.349212778 cg00272795 17.3492128
## 103 0.000000000 7.44782166 17.192451883 cg12784167 17.1924519
## 104 16.760296629 0.00000000 6.637118816 cg03660162 16.7602966
## 105 0.000000000 16.02256145 16.446415451 cg16178271 16.4464155
## 106 16.358446021 0.00000000 11.973566568 cg27577781 16.3584460
## 107 16.135595474 0.00000000 8.267086014 cg07138269 16.1355955
## 108 15.967517751 2.87983309 2.060507011 cg05321907 15.9675178
## 109 0.763644301 15.69596760 2.146109991 cg22274273 15.6959676
## 110 0.465140829 3.15694966 15.545670122 cg15865722 15.5456701
## 111 13.421528890 15.52970945 0.000000000 cg21209485 15.5297095
## 112 15.459967702 0.63930260 3.690621649 cg20139683 15.4599677
## 113 0.806019496 15.27068730 2.249971940 cg15633912 15.2706873
## 114 1.777742277 15.20212112 0.498002092 cg00675157 15.2021211
## 115 0.000000000 15.00877535 13.718828048 cg21854924 15.0087753
## 116 0.000000000 8.30855718 14.973459912 cg14564293 14.9734599
## 117 1.414245340 14.67010021 1.622457833 cg01933473 14.6701002
## 118 14.371970002 0.00000000 2.335595291 cg06950937 14.3719700
## 119 7.032260151 0.00000000 14.261003577 cg14293999 14.2610036
## 120 0.000000000 7.60390702 14.096008410 cg01128042 14.0960084
## 121 13.961256044 0.00000000 2.032447525 cg12776173 13.9612560
## 122 13.948299380 0.00000000 13.909155305 cg03327352 13.9482994
## 123 8.335142999 0.00000000 13.922313003 cg24851651 13.9223130
## 124 13.710854192 0.00000000 7.312289015 cg00696044 13.7108542
## 125 8.530049377 0.00000000 13.699914296 cg19377607 13.6999143
## 126 0.000000000 2.79684895 13.614447171 cg01153376 13.6144472
## 127 13.575135706 3.87959208 0.000000000 cg19512141 13.5751357
## 128 0.000000000 6.30757114 13.531294057 cg18819889 13.5312941
## 129 8.872915122 0.00000000 13.122201001 cg27272246 13.1222010
## 130 12.212109825 0.00000000 12.993192871 cg08198851 12.9931929
## 131 0.000000000 9.81433817 12.660695198 cg06118351 12.6606952
## 132 4.067801497 12.39688646 0.000000000 cg10985055 12.3968865
## 133 0.923845356 11.76399533 0.005381082 cg16788319 11.7639953
## 134 1.051827355 11.74956908 0.000000000 cg14240646 11.7495691
## 135 0.791232287 11.56067436 0.390564952 cg00999469 11.5606744
## 136 0.000000000 11.34437197 10.931309813 cg12012426 11.3443720
## 137 0.000000000 2.67641565 10.890005132 cg01549082 10.8900051
## 138 10.738506551 0.00000000 9.162934157 cg21697769 10.7385066
## 139 10.648814810 0.00000000 7.606004675 cg07028768 10.6488148
## 140 10.321683281 3.96413046 0.000000000 cg17906851 10.3216833
## 141 0.000000000 8.37428680 9.799369121 cg27086157 9.7993691
## 142 0.306298996 9.75465887 0.000000000 cg06697310 9.7546589
## 143 9.742996011 9.22757841 0.000000000 cg08584917 9.7429960
## 144 0.599453522 9.52207363 0.000000000 cg02320265 9.5220736
## 145 2.496774081 0.00000000 9.504663621 cg04664583 9.5046636
## 146 4.882144523 0.00000000 8.715422642 cg14307563 8.7154226
## 147 6.234496873 0.00000000 8.452431120 cg08779649 8.4524311
## 148 0.000000000 6.07518883 7.339581693 cg00154902 7.3395817
## 149 0.000000000 0.00000000 6.392651374 cg12466610 6.3926514
## 150 6.359265415 4.09026644 0.000000000 cg27639199 6.3592654
## 151 0.000000000 5.86623615 4.807421306 cg00689685 5.8662361
## 152 0.000000000 2.92271158 5.199320216 cg15501526 5.1993202
## 153 2.835382514 0.00000000 0.000000000 cg01413796 2.8353825
## 154 0.421134550 0.00000000 0.566104791 age.now 0.5661048
## 155 0.000000000 0.43969964 0.043762166 cg11247378 0.4396996
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 80.669260 100.000000 0.000000 PC1 100.00000
## 2 38.819826 80.718162 0.000000 PC2 80.71816
## 3 56.188431 9.091647 33.489594 cg00962106 56.18843
## 4 7.544944 0.000000 55.849962 PC3 55.84996
## 5 26.318196 48.653394 6.548594 cg19503462 48.65339
## 6 47.893543 21.174860 8.084363 cg27452255 47.89354
## 7 27.958187 45.984179 1.303750 cg07152869 45.98418
## 8 3.341263 45.588884 28.316498 cg05096415 45.58888
## 9 18.278052 12.770362 45.585043 cg02225060 45.58504
## 10 45.320598 8.650046 21.700387 cg14710850 45.32060
## 11 23.092640 5.916777 45.292211 cg02981548 45.29221
## 12 44.859679 0.000000 36.593454 cg08861434 44.85968
## 13 14.448059 42.010742 10.561922 cg03129555 42.01074
## 14 41.985454 6.884170 20.286476 cg23432430 41.98545
## 15 8.920752 17.012085 41.732120 cg16749614 41.73212
## 16 3.592550 40.122633 25.159849 cg17186592 40.12263
## 17 1.855732 38.978598 23.218315 cg14924512 38.97860
## 18 38.236309 7.582832 15.073046 cg09584650 38.23631
## 19 13.554947 38.080241 11.894853 cg06864789 38.08024
## 20 19.824747 37.841662 3.061516 cg03084184 37.84166
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "PC3" "cg19503462" "cg27452255" "cg07152869"
## [8] "cg05096415" "cg02225060" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.6593476 0.42673396
## 0 0.05357895 0.6725349 0.43439423
## 0 0.10615789 0.6747338 0.43094148
## 0 0.15873684 0.6725599 0.42391171
## 0 0.21131579 0.6725837 0.41818370
## 0 0.26389474 0.6770526 0.42406079
## 0 0.31647368 0.6769804 0.41856449
## 0 0.36905263 0.6726087 0.40853473
## 0 0.42163158 0.6638170 0.38542265
## 0 0.47421053 0.6660148 0.38902178
## 0 0.52678947 0.6594214 0.37628816
## 0 0.57936842 0.6550252 0.36510400
## 0 0.63194737 0.6528274 0.35927177
## 0 0.68452632 0.6418618 0.33471759
## 0 0.73710526 0.6352200 0.31832804
## 0 0.78968421 0.6307756 0.30720022
## 0 0.84226316 0.6263800 0.29777058
## 0 0.89484211 0.6220322 0.28739881
## 0 0.94742105 0.6220322 0.28739881
## 0 1.00000000 0.6220322 0.28682520
## 1 0.00100000 0.6240596 0.37352512
## 1 0.05357895 0.5187546 0.05457313
## 1 0.10615789 0.5142862 0.00000000
## 1 0.15873684 0.5142862 0.00000000
## 1 0.21131579 0.5142862 0.00000000
## 1 0.26389474 0.5142862 0.00000000
## 1 0.31647368 0.5142862 0.00000000
## 1 0.36905263 0.5142862 0.00000000
## 1 0.42163158 0.5142862 0.00000000
## 1 0.47421053 0.5142862 0.00000000
## 1 0.52678947 0.5142862 0.00000000
## 1 0.57936842 0.5142862 0.00000000
## 1 0.63194737 0.5142862 0.00000000
## 1 0.68452632 0.5142862 0.00000000
## 1 0.73710526 0.5142862 0.00000000
## 1 0.78968421 0.5142862 0.00000000
## 1 0.84226316 0.5142862 0.00000000
## 1 0.89484211 0.5142862 0.00000000
## 1 0.94742105 0.5142862 0.00000000
## 1 1.00000000 0.5142862 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
FeatEval_Mean_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Mean_mean_accuracy_cv_ENM1)
## [1] 0.5868952
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Mean_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.863736263736264"
print(FeatEval_Mean_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Mean_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Mean_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 5 13
## Dementia 0 8 0
## MCI 21 15 86
##
## Overall Statistics
##
## Accuracy : 0.7202
## 95% CI : (0.6512, 0.7823)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 3.473e-09
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 6.901e-05
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.28571 0.8687
## Specificity 0.8583 1.00000 0.6170
## Pos Pred Value 0.7143 1.00000 0.7049
## Neg Pred Value 0.8385 0.89189 0.8169
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.04145 0.4456
## Detection Prevalence 0.3264 0.04145 0.6321
## Balanced Accuracy 0.7700 0.64286 0.7429
cm_FeatEval_Mean_ENM1_Accuracy<-cm_FeatEval_Mean_ENM1$overall["Accuracy"]
cm_FeatEval_Mean_ENM1_Kappa<-cm_FeatEval_Mean_ENM1$overall["Kappa"]
print(cm_FeatEval_Mean_ENM1_Accuracy)
## Accuracy
## 0.7202073
print(cm_FeatEval_Mean_ENM1_Kappa)
## Kappa
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 86.62 100.000 13.315
## PC2 68.42 88.610 20.131
## cg00962106 72.97 12.360 60.542
## cg02225060 43.14 18.830 62.030
## cg02981548 49.97 8.976 59.006
## cg23432430 57.29 15.758 41.468
## cg14710850 54.50 8.365 46.076
## cg16749614 20.68 33.681 54.421
## cg07152869 48.29 54.287 5.938
## cg08857872 29.00 24.418 53.479
## cg16652920 27.04 25.381 52.480
## cg26948066 51.16 42.094 9.007
## PC3 12.12 38.675 50.853
## cg08861434 48.61 1.032 49.702
## cg27452255 49.50 29.752 19.681
## cg09584650 48.12 20.551 27.501
## cg11133939 31.91 15.806 47.783
## cg19503462 47.24 44.918 2.255
## cg06864789 20.57 46.483 25.849
## cg02372404 30.74 14.687 45.489
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 86.62120555 1.000000e+02 13.3154938 PC1 100.0000000
## 2 68.41506593 8.860969e+01 20.1313229 PC2 88.6096894
## 3 72.96616233 1.236037e+01 60.5424892 cg00962106 72.9661623
## 4 43.13680835 1.882953e+01 62.0296384 cg02225060 62.0296384
## 5 49.96682179 8.975883e+00 59.0060055 cg02981548 59.0060055
## 6 57.28920981 1.575794e+01 41.4679663 cg23432430 57.2892098
## 7 54.50457856 8.365086e+00 46.0761921 cg14710850 54.5045786
## 8 20.67732060 3.368079e+01 54.4214110 cg16749614 54.4214110
## 9 48.28564292 5.428696e+01 5.9380180 cg07152869 54.2869615
## 10 28.99802672 2.441777e+01 53.4791022 cg08857872 53.4791022
## 11 27.03540858 2.538148e+01 52.4801847 cg16652920 52.4801847
## 12 51.16479077 4.209436e+01 9.0071343 cg26948066 51.1647908
## 13 12.11541799 3.867459e+01 50.8533134 PC3 50.8533134
## 14 48.60716543 1.031662e+00 49.7021276 cg08861434 49.7021276
## 15 49.49661951 2.975203e+01 19.6812848 cg27452255 49.4966195
## 16 48.11515416 2.055094e+01 27.5009105 cg09584650 48.1151542
## 17 31.91327900 1.580637e+01 47.7829449 cg11133939 47.7829449
## 18 47.23672408 4.491797e+01 2.2554575 cg19503462 47.2367241
## 19 20.57040391 4.648297e+01 25.8492634 cg06864789 46.4829680
## 20 30.73922491 1.468689e+01 45.4894183 cg02372404 45.4894183
## 21 13.69661443 4.531850e+01 31.5585879 cg24859648 45.3185029
## 22 10.38190144 3.472253e+01 45.1677367 cg14527649 45.1677367
## 23 44.71353025 3.266273e+01 11.9874954 cg03982462 44.7135302
## 24 43.77701152 1.498667e+01 28.7270404 cg06536614 43.7770115
## 25 0.05742514 4.329817e+01 43.1774484 cg17186592 43.2981742
## 26 26.35394431 1.675139e+01 43.1686358 cg26219488 43.1686358
## 27 42.96370404 1.408099e+01 28.8194115 cg10240127 42.9637040
## 28 13.43767401 4.289926e+01 29.3982818 cg00247094 42.8992564
## 29 35.47297173 6.860107e+00 42.3963796 cg20685672 42.3963796
## 30 3.59530245 4.215625e+01 38.4976467 cg25259265 42.1562498
## 31 42.14113645 1.425879e+01 27.8190416 cg16715186 42.1411365
## 32 0.72769110 4.194567e+01 41.1546792 cg05096415 41.9456709
## 33 34.83831577 4.176667e+01 6.8650561 cg15775217 41.7666725
## 34 15.96690586 4.058821e+01 24.5580004 cg24861747 40.5882068
## 35 34.02934141 6.216275e+00 40.3089166 cg07028768 40.3089166
## 36 4.42781931 3.973144e+01 35.2403198 cg14924512 39.7314398
## 37 24.97951953 3.964246e+01 14.5996349 cg03084184 39.6424550
## 38 4.47173997 3.907243e+01 34.5373846 cg05570109 39.0724252
## 39 34.88057483 4.000672e+00 38.9445475 cg01921484 38.9445475
## 40 9.76127248 2.779390e+01 37.6184714 cg00154902 37.6184714
## 41 28.32748759 3.744433e+01 9.0535424 cg26757229 37.4443306
## 42 37.35561895 9.847796e+00 27.4445223 cg03660162 37.3556189
## 43 35.88442086 5.170695e-01 36.4647910 cg12228670 36.4647910
## 44 4.42310262 3.173869e+01 36.2250936 cg00616572 36.2250936
## 45 14.11894393 3.616405e+01 21.9818063 cg20507276 36.1640508
## 46 5.45777159 3.544527e+01 29.9241978 cg05841700 35.4452700
## 47 21.86701375 1.351361e+01 35.4439265 cg06715136 35.4439265
## 48 22.83529876 1.227374e+01 35.1723403 cg02621446 35.1723403
## 49 18.36208785 3.501828e+01 16.5928893 cg12738248 35.0182778
## 50 14.22696004 3.493687e+01 20.6466078 cg09854620 34.9368684
## 51 32.22385216 3.482028e+01 2.5331259 cg00322003 34.8202787
## 52 8.08624317 2.660459e+01 34.7541343 cg24873924 34.7541343
## 53 14.18196370 3.469870e+01 20.4534323 cg03129555 34.6986966
## 54 34.67728863 7.588431e+00 27.0255567 cg04412904 34.6772886
## 55 15.01426631 1.956865e+01 34.6462192 cg17738613 34.6462192
## 56 18.92211499 1.559144e+01 34.5768598 cg25879395 34.5768598
## 57 34.34194321 1.088790e+01 23.3907383 cg05234269 34.3419432
## 58 22.74955912 3.407311e+01 11.2602539 cg20913114 34.0731137
## 59 1.10574447 3.257061e+01 33.7396527 cg02494911 33.7396527
## 60 17.46672005 3.350959e+01 15.9795697 cg00675157 33.5095904
## 61 26.91097526 3.346748e+01 6.4932061 cg12279734 33.4674820
## 62 12.81125802 2.054875e+01 33.4233079 cg01153376 33.4233079
## 63 30.29546006 2.966941e+00 33.3257016 cg04248279 33.3257016
## 64 30.64271695 3.320881e+01 2.5027880 cg06697310 33.2088056
## 65 19.20109381 1.362764e+01 32.8920329 cg16771215 32.8920329
## 66 25.57285198 3.289020e+01 7.2540515 cg26474732 32.8902041
## 67 1.21338315 3.269540e+01 31.4187171 cg12534577 32.6954009
## 68 14.55218922 3.243695e+01 17.8214588 cg06378561 32.4369487
## 69 19.19031334 1.316187e+01 32.4154803 cg18819889 32.4154803
## 70 29.77580872 3.222224e+01 2.3831323 cg01013522 32.2222417
## 71 8.93820972 2.321177e+01 32.2132838 cg10369879 32.2132838
## 72 31.33699934 9.314901e+00 21.9587973 cg03327352 31.3369993
## 73 31.30078323 8.697086e+00 22.5403967 cg07138269 31.3007832
## 74 30.28086989 7.153738e-01 31.0595443 cg12146221 31.0595443
## 75 31.01515677 1.154188e+01 19.4099769 cg11227702 31.0151568
## 76 30.50997690 2.051326e-01 30.7784101 cg27577781 30.7784101
## 77 30.73604217 2.929706e+01 1.3756812 cg02356645 30.7360422
## 78 10.88804539 1.960641e+01 30.5577524 cg15865722 30.5577524
## 79 21.12659334 3.052710e+01 9.3372037 cg18339359 30.5270977
## 80 21.72379588 3.049938e+01 8.7122800 cg08584917 30.4993765
## 81 30.48187501 1.623371e+01 14.1848668 cg15535896 30.4818750
## 82 9.34486828 3.034689e+01 20.9387194 cg01680303 30.3468883
## 83 0.66118138 2.956653e+01 30.2910133 cg01667144 30.2910133
## 84 17.55766315 2.993390e+01 12.3129407 cg07523188 29.9339044
## 85 12.71980384 1.708478e+01 29.8678851 cg21854924 29.8678851
## 86 9.99015500 2.974249e+01 19.6890322 cg10750306 29.7424878
## 87 5.72469553 2.961587e+01 23.8278786 cg16579946 29.6158747
## 88 29.45305133 5.870075e+00 23.5196762 cg11438323 29.4530513
## 89 7.90125584 2.936465e+01 21.4000924 cg18821122 29.3646489
## 90 13.47339382 1.551441e+01 29.0511081 cg01128042 29.0511081
## 91 12.43918028 1.650836e+01 29.0108418 cg14564293 29.0108418
## 92 28.69944088 4.408577e-01 28.1952826 cg08198851 28.6994409
## 93 25.92061288 2.700534e+00 28.6844472 cg00696044 28.6844472
## 94 28.64639261 7.487005e+00 21.0960870 cg17421046 28.6463926
## 95 28.22189323 1.423101e+01 13.9275795 cg11331837 28.2218932
## 96 4.57947370 2.318215e+01 27.8249201 cg12682323 27.8249201
## 97 27.75324966 2.314613e+01 4.5438216 cg02932958 27.7532497
## 98 2.23125343 2.770568e+01 25.4111304 cg23658987 27.7056844
## 99 13.54232520 1.406008e+01 27.6657079 cg07480176 27.6657079
## 100 18.99349539 8.561292e+00 27.6180885 cg10738648 27.6180885
## 101 23.24340158 4.224666e+00 27.5313687 cg03071582 27.5313687
## 102 27.50590369 1.371648e+01 13.7261241 cg25758034 27.5059037
## 103 8.31694986 1.850464e+01 26.8848892 cg06118351 26.8848892
## 104 26.47439884 2.668353e+01 0.1458283 cg19512141 26.6835278
## 105 15.77673761 2.662697e+01 10.7869322 cg23161429 26.6269705
## 106 13.98076844 2.639473e+01 12.3506587 cg11247378 26.3947278
## 107 18.59031320 7.685003e+00 26.3386166 cg20678988 26.3386166
## 108 14.36946682 1.154490e+01 25.9776676 cg27086157 25.9776676
## 109 25.84471361 9.776568e+00 16.0048448 cg03088219 25.8447136
## 110 13.62887699 2.527551e+01 11.5833356 cg22274273 25.2755132
## 111 2.73157059 2.236077e+01 25.1556395 cg13885788 25.1556395
## 112 7.97199814 1.668181e+01 24.7171056 cg14240646 24.7171056
## 113 23.64743847 7.878552e-01 24.4985943 cg06112204 24.4985943
## 114 24.37541925 4.910134e+00 19.4019850 cg17429539 24.3754193
## 115 23.05439563 2.435219e+01 1.2344903 cg25561557 24.3521866
## 116 21.11737928 3.135116e+00 24.3157963 cg14293999 24.3157963
## 117 15.52438712 8.640581e+00 24.2282683 cg19377607 24.2282683
## 118 21.13723634 2.411155e+01 2.9110134 cg06950937 24.1115504
## 119 24.09543447 4.091862e+00 19.9402723 cg25436480 24.0954345
## 120 14.61419073 9.017717e+00 23.6952080 cg00272795 23.6952080
## 121 10.00780864 1.338601e+01 23.4571172 cg12012426 23.4571172
## 122 23.37986405 1.718248e+01 6.1340826 cg05321907 23.3798640
## 123 23.15469253 9.974091e+00 13.1173009 cg20139683 23.1546925
## 124 0.72251260 2.312762e+01 22.3418067 cg26069044 23.1276199
## 125 21.02399526 2.241551e+01 1.3282102 cg23916408 22.4155061
## 126 0.60421044 2.223064e+01 21.5631268 cg27341708 22.2306378
## 127 15.96983051 2.220762e+01 6.1744927 cg13080267 22.2076238
## 128 21.86254895 1.299155e+00 20.5000930 cg27272246 21.8625490
## 129 0.95607645 2.184173e+01 20.8223545 cg12284872 21.8417315
## 130 2.40807486 2.169957e+01 19.2281930 cg00689685 21.6995684
## 131 2.01248824 2.152691e+01 19.4511227 cg16178271 21.5269115
## 132 21.27794768 8.125429e+00 13.0892177 cg21209485 21.2779477
## 133 20.58897269 1.059029e+01 9.9353779 cg24851651 20.5889727
## 134 20.33635501 7.328012e+00 12.9450419 cg21697769 20.3363550
## 135 20.33048061 6.214862e+00 14.0523177 cg04664583 20.3304806
## 136 14.64078345 1.993511e+01 5.2310225 cg00999469 19.9351066
## 137 2.26784740 1.742905e+01 19.7602009 cg20370184 19.7602009
## 138 18.98159250 4.184462e+00 14.7338294 cg11187460 18.9815925
## 139 18.43567987 1.998139e+00 16.3742404 cg12784167 18.4356799
## 140 1.20208781 1.698257e+01 18.2479601 cg02320265 18.2479601
## 141 17.49161956 1.357721e+01 3.8511068 cg12776173 17.4916196
## 142 17.27715467 1.271737e+00 15.9421167 cg08779649 17.2771547
## 143 8.18293699 8.988656e+00 17.2348934 cg01933473 17.2348934
## 144 17.18534234 8.948602e+00 8.1734396 cg15501526 17.1853423
## 145 13.77337037 1.693396e+01 3.0972842 cg10985055 16.9339552
## 146 16.16407186 6.750172e+00 9.3505990 cg17906851 16.1640719
## 147 11.29937443 4.707856e+00 16.0705306 cg14307563 16.0705306
## 148 4.33271953 1.431148e+01 9.9154553 cg16788319 14.3114754
## 149 11.34917622 1.384179e+01 2.4293151 cg24506579 13.8417919
## 150 9.52161846 1.242055e+01 2.8356319 cg27639199 12.4205510
## 151 1.91338340 1.029514e+01 12.2718262 cg12466610 12.2718262
## 152 9.00378501 2.189396e+00 11.2564819 cg15633912 11.2564819
## 153 0.00000000 1.116843e+01 11.2317321 cg01413796 11.2317321
## 154 1.45869467 1.880830e-01 1.7100783 cg01549082 1.7100783
## 155 0.70747253 6.047179e-03 0.7768203 age.now 0.7768203
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 86.62121 100.000000 13.315494 PC1 100.00000
## 2 68.41507 88.609689 20.131323 PC2 88.60969
## 3 72.96616 12.360372 60.542489 cg00962106 72.96616
## 4 43.13681 18.829529 62.029638 cg02225060 62.02964
## 5 49.96682 8.975883 59.006005 cg02981548 59.00601
## 6 57.28921 15.757943 41.467966 cg23432430 57.28921
## 7 54.50458 8.365086 46.076192 cg14710850 54.50458
## 8 20.67732 33.680790 54.421411 cg16749614 54.42141
## 9 48.28564 54.286962 5.938018 cg07152869 54.28696
## 10 28.99803 24.417775 53.479102 cg08857872 53.47910
## 11 27.03541 25.381475 52.480185 cg16652920 52.48018
## 12 51.16479 42.094356 9.007134 cg26948066 51.16479
## 13 12.11542 38.674595 50.853313 PC3 50.85331
## 14 48.60717 1.031662 49.702128 cg08861434 49.70213
## 15 49.49662 29.752034 19.681285 cg27452255 49.49662
## 16 48.11515 20.550943 27.500911 cg09584650 48.11515
## 17 31.91328 15.806365 47.782945 cg11133939 47.78294
## 18 47.23672 44.917966 2.255458 cg19503462 47.23672
## 19 20.57040 46.482968 25.849263 cg06864789 46.48297
## 20 30.73922 14.686893 45.489418 cg02372404 45.48942
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
## [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3" "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_ENM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.5604677 0.18305636
## 0.3 1 0.6 0.50 100 0.5692090 0.22745054
## 0.3 1 0.6 0.50 150 0.5801996 0.24875370
## 0.3 1 0.6 0.75 50 0.5517710 0.15720609
## 0.3 1 0.6 0.75 100 0.5649350 0.20438528
## 0.3 1 0.6 0.75 150 0.5628343 0.20763607
## 0.3 1 0.6 1.00 50 0.5122095 0.08805782
## 0.3 1 0.6 1.00 100 0.5452498 0.16811878
## 0.3 1 0.6 1.00 150 0.5606583 0.20408170
## 0.3 1 0.8 0.50 50 0.5692090 0.20998376
## 0.3 1 0.8 0.50 100 0.5846430 0.25722266
## 0.3 1 0.8 0.50 150 0.5779524 0.24804711
## 0.3 1 0.8 0.75 50 0.5428115 0.15376293
## 0.3 1 0.8 0.75 100 0.5583394 0.19598168
## 0.3 1 0.8 0.75 150 0.5650289 0.21626454
## 0.3 1 0.8 1.00 50 0.5210251 0.10441192
## 0.3 1 0.8 1.00 100 0.5430270 0.15979995
## 0.3 1 0.8 1.00 150 0.5562133 0.19365639
## 0.3 2 0.6 0.50 50 0.5384865 0.16265986
## 0.3 2 0.6 0.50 100 0.5669863 0.22298076
## 0.3 2 0.6 0.50 150 0.5693534 0.23182757
## 0.3 2 0.6 0.75 50 0.5583644 0.18712040
## 0.3 2 0.6 0.75 100 0.5937012 0.26024271
## 0.3 2 0.6 0.75 150 0.5826140 0.24618593
## 0.3 2 0.6 1.00 50 0.5408053 0.15939812
## 0.3 2 0.6 1.00 100 0.5516733 0.18052472
## 0.3 2 0.6 1.00 150 0.5649090 0.21413874
## 0.3 2 0.8 0.50 50 0.5869390 0.23990826
## 0.3 2 0.8 0.50 100 0.5824930 0.23552234
## 0.3 2 0.8 0.50 150 0.5758996 0.22732182
## 0.3 2 0.8 0.75 50 0.5934820 0.25135060
## 0.3 2 0.8 0.75 100 0.5912836 0.25346437
## 0.3 2 0.8 0.75 150 0.5978776 0.26886330
## 0.3 2 0.8 1.00 50 0.5540160 0.17911553
## 0.3 2 0.8 1.00 100 0.5451043 0.16459816
## 0.3 2 0.8 1.00 150 0.5670351 0.21007127
## 0.3 3 0.6 0.50 50 0.5650783 0.21060229
## 0.3 3 0.6 0.50 100 0.5606822 0.20850911
## 0.3 3 0.6 0.50 150 0.5629044 0.21133011
## 0.3 3 0.6 0.75 50 0.5780485 0.22322683
## 0.3 3 0.6 0.75 100 0.5956570 0.25481644
## 0.3 3 0.6 0.75 150 0.6022504 0.27090892
## 0.3 3 0.6 1.00 50 0.5781701 0.22288203
## 0.3 3 0.6 1.00 100 0.5869385 0.23776743
## 0.3 3 0.6 1.00 150 0.5803690 0.23301708
## 0.3 3 0.8 0.50 50 0.5450565 0.16059080
## 0.3 3 0.8 0.50 100 0.5650311 0.20879832
## 0.3 3 0.8 0.50 150 0.5737745 0.22394322
## 0.3 3 0.8 0.75 50 0.5516749 0.17172162
## 0.3 3 0.8 0.75 100 0.5715528 0.21250756
## 0.3 3 0.8 0.75 150 0.5692817 0.20717324
## 0.3 3 0.8 1.00 50 0.5605882 0.18804779
## 0.3 3 0.8 1.00 100 0.5606110 0.18873086
## 0.3 3 0.8 1.00 150 0.5649589 0.19776848
## 0.4 1 0.6 0.50 50 0.5318442 0.15010166
## 0.4 1 0.6 0.50 100 0.5670590 0.22666754
## 0.4 1 0.6 0.50 150 0.5670584 0.23727499
## 0.4 1 0.6 0.75 50 0.5385582 0.16666479
## 0.4 1 0.6 0.75 100 0.5758741 0.24000710
## 0.4 1 0.6 0.75 150 0.5649573 0.22879025
## 0.4 1 0.6 1.00 50 0.5409269 0.14921611
## 0.4 1 0.6 1.00 100 0.5585571 0.19773274
## 0.4 1 0.6 1.00 150 0.5694739 0.22587501
## 0.4 1 0.8 0.50 50 0.5430042 0.16638048
## 0.4 1 0.8 0.50 100 0.5584371 0.21323998
## 0.4 1 0.8 0.50 150 0.5738462 0.24123026
## 0.4 1 0.8 0.75 50 0.5606349 0.18669584
## 0.4 1 0.8 0.75 100 0.5496199 0.18521750
## 0.4 1 0.8 0.75 150 0.5803180 0.24933277
## 0.4 1 0.8 1.00 50 0.5343091 0.13804392
## 0.4 1 0.8 1.00 100 0.5474954 0.18343496
## 0.4 1 0.8 1.00 150 0.5605861 0.21083173
## 0.4 2 0.6 0.50 50 0.5295748 0.15350756
## 0.4 2 0.6 0.50 100 0.5604661 0.21432524
## 0.4 2 0.6 0.50 150 0.5583877 0.20905591
## 0.4 2 0.6 0.75 50 0.5714790 0.22465117
## 0.4 2 0.6 0.75 100 0.5539927 0.19184765
## 0.4 2 0.6 0.75 150 0.5605378 0.21420363
## 0.4 2 0.6 1.00 50 0.5671312 0.19986270
## 0.4 2 0.6 1.00 100 0.5626140 0.20310297
## 0.4 2 0.6 1.00 150 0.5825407 0.24632408
## 0.4 2 0.8 0.50 50 0.5495249 0.19461294
## 0.4 2 0.8 0.50 100 0.5650311 0.22003249
## 0.4 2 0.8 0.50 150 0.5760450 0.25188625
## 0.4 2 0.8 0.75 50 0.5497903 0.18106700
## 0.4 2 0.8 0.75 100 0.5847640 0.25087776
## 0.4 2 0.8 0.75 150 0.5694017 0.22664696
## 0.4 2 0.8 1.00 50 0.5692807 0.20731718
## 0.4 2 0.8 1.00 100 0.5826135 0.23323877
## 0.4 2 0.8 1.00 150 0.5912842 0.25931344
## 0.4 3 0.6 0.50 50 0.5690641 0.22158735
## 0.4 3 0.6 0.50 100 0.5670362 0.22452003
## 0.4 3 0.6 0.50 150 0.5582439 0.20883481
## 0.4 3 0.6 0.75 50 0.5670829 0.21530295
## 0.4 3 0.6 0.75 100 0.5846902 0.24552308
## 0.4 3 0.6 0.75 150 0.5780480 0.23934597
## 0.4 3 0.6 1.00 50 0.5691612 0.20525775
## 0.4 3 0.6 1.00 100 0.5801263 0.23012788
## 0.4 3 0.6 1.00 150 0.5800536 0.23200011
## 0.4 3 0.8 0.50 50 0.6022010 0.27466788
## 0.4 3 0.8 0.50 100 0.5978054 0.27332851
## 0.4 3 0.8 0.50 150 0.5911876 0.26561069
## 0.4 3 0.8 0.75 50 0.5824691 0.22001427
## 0.4 3 0.8 0.75 100 0.5891347 0.23859637
## 0.4 3 0.8 0.75 150 0.5890869 0.24245473
## 0.4 3 0.8 1.00 50 0.5604905 0.18793888
## 0.4 3 0.8 1.00 100 0.5627377 0.20109751
## 0.4 3 0.8 1.00 150 0.5715772 0.21562228
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
## 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma =
## 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.75.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5663579
FeatEval_Mean_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Mean_mean_accuracy_cv_xgb)
## [1] 0.5663579
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Mean_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Mean_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Mean_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Mean_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 37 9 15
## Dementia 0 4 2
## MCI 29 15 82
##
## Overall Statistics
##
## Accuracy : 0.6373
## 95% CI : (0.5652, 0.7051)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 0.0003288
##
## Kappa : 0.3436
##
## Mcnemar's Test P-Value : 3.34e-05
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.5606 0.14286 0.8283
## Specificity 0.8110 0.98788 0.5319
## Pos Pred Value 0.6066 0.66667 0.6508
## Neg Pred Value 0.7803 0.87166 0.7463
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.1917 0.02073 0.4249
## Detection Prevalence 0.3161 0.03109 0.6528
## Balanced Accuracy 0.6858 0.56537 0.6801
cm_FeatEval_Mean_xgb_Accuracy <-cm_FeatEval_Mean_xgb$overall["Accuracy"]
cm_FeatEval_Mean_xgb_Kappa <-cm_FeatEval_Mean_xgb$overall["Kappa"]
print(cm_FeatEval_Mean_xgb_Accuracy)
## Accuracy
## 0.6373057
print(cm_FeatEval_Mean_xgb_Kappa)
## Kappa
## 0.3435693
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 155)
##
## Overall
## age.now 100.00
## cg00962106 56.99
## cg09584650 53.51
## cg08857872 51.69
## cg14710850 48.76
## cg15501526 47.66
## cg02356645 47.36
## cg24861747 46.85
## cg03084184 46.62
## cg16771215 45.84
## cg02225060 45.82
## cg00154902 43.79
## cg03088219 43.41
## cg06864789 42.64
## cg02981548 42.60
## cg05234269 42.19
## cg17186592 41.63
## cg14293999 41.06
## cg01921484 40.81
## cg01013522 40.80
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: age.now 0.0335992468 0.0357993760 0.0162321692 0.0335992468
## 2: cg00962106 0.0192050561 0.0286619797 0.0152484014 0.0192050561
## 3: cg09584650 0.0180386412 0.0195146141 0.0113133301 0.0180386412
## 4: cg08857872 0.0174293205 0.0207729971 0.0118052140 0.0174293205
## 5: cg14710850 0.0164488979 0.0146623552 0.0118052140 0.0164488979
## ---
## 151: cg20370184 0.0009022372 0.0006686374 0.0024594196 0.0009022372
## 152: cg00272795 0.0008131386 0.0010294737 0.0019675357 0.0008131386
## 153: cg12466610 0.0007405692 0.0008348438 0.0024594196 0.0007405692
## 154: cg20678988 0.0004758319 0.0018982431 0.0054107231 0.0004758319
## 155: cg27272246 0.0001311978 0.0004099423 0.0009837678 0.0001311978
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.731
## The AUC value for class CN is: 0.7309711
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6333
## The AUC value for class Dementia is: 0.6333333
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.7237
## The AUC value for class MCI is: 0.7237266
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_xgb_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6960104
print(FeatEval_Mean_xgb_AUC)
## [1] 0.6960104
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.5297930 0.03900252
## 78 0.5670845 0.15328626
## 155 0.5361687 0.09604486
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 78.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.5443487
FeatEval_Mean_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Mean_mean_accuracy_cv_rf)
## [1] 0.5443487
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Mean_rf_trainAccuracy<-train_accuracy
print(FeatEval_Mean_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Mean_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Mean_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 16 4 6
## Dementia 0 0 0
## MCI 50 24 93
##
## Overall Statistics
##
## Accuracy : 0.5648
## 95% CI : (0.4917, 0.6358)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 0.08547
##
## Kappa : 0.1467
##
## Mcnemar's Test P-Value : 1.658e-13
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.2424 0.0000 0.9394
## Specificity 0.9213 1.0000 0.2128
## Pos Pred Value 0.6154 NaN 0.5569
## Neg Pred Value 0.7006 0.8549 0.7692
## Prevalence 0.3420 0.1451 0.5130
## Detection Rate 0.0829 0.0000 0.4819
## Detection Prevalence 0.1347 0.0000 0.8653
## Balanced Accuracy 0.5818 0.5000 0.5761
cm_FeatEval_Mean_rf_Accuracy<-cm_FeatEval_Mean_rf$overall["Accuracy"]
print(cm_FeatEval_Mean_rf_Accuracy)
## Accuracy
## 0.5647668
cm_FeatEval_Mean_rf_Kappa<-cm_FeatEval_Mean_rf$overall["Kappa"]
print(cm_FeatEval_Mean_rf_Kappa)
## Kappa
## 0.1467368
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## cg15501526 69.03 18.177 100.00
## age.now 52.94 49.707 76.90
## cg01153376 36.87 52.235 64.43
## cg00962106 58.70 7.822 55.31
## cg06864789 18.69 56.208 36.56
## cg25259265 25.46 36.913 55.20
## cg12012426 25.46 23.629 52.64
## cg01013522 33.87 24.581 52.05
## cg08857872 22.43 51.916 49.54
## cg02494911 31.61 11.191 51.26
## cg04412904 50.30 2.344 25.61
## cg10985055 30.07 50.212 41.01
## cg11133939 49.41 28.237 34.88
## cg05234269 40.04 22.300 48.68
## cg02356645 18.63 48.341 27.81
## cg11438323 42.84 23.713 47.53
## cg06112204 30.93 47.297 38.54
## cg22274273 20.68 47.154 22.75
## cg16771215 23.06 17.890 47.07
## cg03088219 46.55 21.499 25.60
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if( METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 69.025719 18.177097 100.000000 cg15501526 100.00000
## 2 52.940101 49.706685 76.896452 age.now 76.89645
## 3 36.869599 52.234879 64.426116 cg01153376 64.42612
## 4 58.696248 7.821543 55.305665 cg00962106 58.69625
## 5 18.694844 56.207569 36.562529 cg06864789 56.20757
## 6 25.460182 36.913238 55.204486 cg25259265 55.20449
## 7 25.459089 23.628894 52.636632 cg12012426 52.63663
## 8 33.874554 24.580893 52.045094 cg01013522 52.04509
## 9 22.433376 51.915676 49.544823 cg08857872 51.91568
## 10 31.611483 11.190557 51.263895 cg02494911 51.26390
## 11 50.304236 2.344009 25.606998 cg04412904 50.30424
## 12 30.066302 50.212412 41.005153 cg10985055 50.21241
## 13 49.414407 28.237257 34.882248 cg11133939 49.41441
## 14 40.044177 22.300343 48.676628 cg05234269 48.67663
## 15 18.634563 48.341100 27.811350 cg02356645 48.34110
## 16 42.837342 23.712832 47.533366 cg11438323 47.53337
## 17 30.926398 47.297242 38.542443 cg06112204 47.29724
## 18 20.675659 47.154365 22.752082 cg22274273 47.15437
## 19 23.058906 17.889971 47.071727 cg16771215 47.07173
## 20 46.546641 21.499184 25.598649 cg03088219 46.54664
## 21 46.299156 24.383707 43.194730 cg25879395 46.29916
## 22 46.221945 40.628164 38.058129 cg06118351 46.22194
## 23 37.823728 45.468390 29.489845 cg05096415 45.46839
## 24 19.977113 45.432659 3.047480 cg07152869 45.43266
## 25 45.261122 10.706496 25.065316 cg00999469 45.26112
## 26 31.579313 26.028728 45.140149 cg23432430 45.14015
## 27 44.676075 20.998192 26.896975 cg14710850 44.67607
## 28 44.616056 41.480593 36.142432 cg17186592 44.61606
## 29 28.717895 44.566513 21.750447 cg00154902 44.56651
## 30 30.513255 44.091763 13.097988 cg16788319 44.09176
## 31 36.199622 42.674675 44.052700 cg11331837 44.05270
## 32 43.858843 29.740998 28.556901 cg20685672 43.85884
## 33 26.437345 42.765073 43.784802 cg03129555 43.78480
## 34 37.500447 14.107280 43.770592 PC2 43.77059
## 35 37.274618 36.739745 43.632531 cg13080267 43.63253
## 36 43.458896 15.822039 34.335056 cg20507276 43.45890
## 37 12.515130 43.449644 24.860904 cg03084184 43.44964
## 38 30.059870 25.487357 43.314275 cg02621446 43.31427
## 39 31.089206 26.854276 43.115162 cg02320265 43.11516
## 40 43.012848 30.906569 19.322140 cg17429539 43.01285
## 41 19.006107 19.029049 42.874501 cg20370184 42.87450
## 42 42.852496 34.882704 24.284788 cg00616572 42.85250
## 43 28.015046 42.724643 31.100165 cg01667144 42.72464
## 44 36.746150 42.285753 23.632463 cg19503462 42.28575
## 45 42.161211 41.618990 42.233668 cg14293999 42.23367
## 46 24.649693 39.956340 42.211846 cg06950937 42.21185
## 47 25.893410 42.189364 27.351884 cg16178271 42.18936
## 48 26.846531 20.739224 42.099121 cg12228670 42.09912
## 49 29.509575 42.035406 28.211285 cg10750306 42.03541
## 50 13.450261 41.727508 26.841960 cg24861747 41.72751
## 51 24.617711 6.911054 41.721486 cg01128042 41.72149
## 52 17.720290 31.935312 41.377142 cg00689685 41.37714
## 53 41.295230 20.948864 22.647020 cg17738613 41.29523
## 54 41.056089 40.016054 31.962280 cg27086157 41.05609
## 55 40.724803 34.122964 39.066960 cg01921484 40.72480
## 56 40.134678 26.430712 13.687010 cg12776173 40.13468
## 57 39.436174 28.509216 40.003148 cg10240127 40.00315
## 58 37.591823 33.612391 39.422725 cg26069044 39.42273
## 59 39.373458 22.645050 22.498601 cg10738648 39.37346
## 60 39.234371 35.604887 10.944789 cg18821122 39.23437
## 61 36.174663 30.393198 38.806667 cg14564293 38.80667
## 62 16.667810 30.499501 38.758682 cg16652920 38.75868
## 63 38.560108 35.742883 28.200941 cg21697769 38.56011
## 64 31.778177 38.498127 22.643868 cg01413796 38.49813
## 65 26.901479 38.486454 21.635240 cg15775217 38.48645
## 66 28.690335 38.406036 31.086838 cg24851651 38.40604
## 67 23.820878 38.116407 20.925077 cg14924512 38.11641
## 68 34.688524 38.085264 26.599561 cg16749614 38.08526
## 69 19.981965 27.810180 37.923333 cg12682323 37.92333
## 70 4.149119 20.574224 37.916308 cg09854620 37.91631
## 71 37.460302 9.901765 31.761301 cg15633912 37.46030
## 72 20.010266 20.719787 37.368421 cg04248279 37.36842
## 73 33.287053 37.226645 36.190064 cg14240646 37.22665
## 74 26.703136 37.116289 33.311032 cg00247094 37.11629
## 75 25.377940 36.626672 22.425621 cg25561557 36.62667
## 76 20.333794 36.564382 32.065371 cg14527649 36.56438
## 77 36.356828 28.241690 25.952427 cg18339359 36.35683
## 78 22.137789 36.198781 23.834773 cg23161429 36.19878
## 79 35.863431 13.807607 20.116686 cg21854924 35.86343
## 80 19.987368 27.391614 35.673944 cg02981548 35.67394
## 81 20.193951 35.305710 19.474929 cg06378561 35.30571
## 82 34.165716 35.191144 24.325520 cg04664583 35.19114
## 83 22.292643 35.135818 30.696590 cg12279734 35.13582
## 84 35.064168 12.339826 21.036389 cg16715186 35.06417
## 85 9.272829 31.203990 34.764027 cg01549082 34.76403
## 86 21.119569 34.728878 22.615180 cg12738248 34.72888
## 87 22.408002 34.692897 21.659519 cg14307563 34.69290
## 88 23.042034 13.574008 34.490826 cg03071582 34.49083
## 89 20.282595 30.621684 34.359566 cg15865722 34.35957
## 90 21.629305 33.960839 19.427351 cg24859648 33.96084
## 91 33.943863 33.580898 30.422424 cg23658987 33.94386
## 92 17.718889 33.908121 9.715448 cg27341708 33.90812
## 93 7.274082 33.903893 32.682399 cg17421046 33.90389
## 94 17.545703 25.351852 33.872028 cg07028768 33.87203
## 95 16.260143 33.847630 8.138731 cg02372404 33.84763
## 96 27.551991 33.205867 16.910630 cg20913114 33.20587
## 97 24.252896 33.168730 29.175380 cg06697310 33.16873
## 98 30.824030 33.142282 7.200007 PC1 33.14228
## 99 31.115880 33.106914 22.203223 cg26757229 33.10691
## 100 19.038944 32.886627 11.810018 cg26948066 32.88663
## 101 32.808660 27.907435 31.911504 cg19377607 32.80866
## 102 32.548751 15.391325 29.901839 cg18819889 32.54875
## 103 17.314251 28.711213 32.536841 cg02225060 32.53684
## 104 16.911844 32.396125 23.929062 cg13885788 32.39613
## 105 19.009425 22.224641 32.359812 cg12466610 32.35981
## 106 26.901023 32.234200 20.492517 cg24873924 32.23420
## 107 13.267439 21.407984 31.940465 cg02932958 31.94046
## 108 14.144413 31.715035 11.737783 cg12534577 31.71504
## 109 6.516654 27.971840 31.646693 cg20678988 31.64669
## 110 10.424939 31.609860 26.239056 cg12146221 31.60986
## 111 10.011114 31.501308 28.412194 cg00675157 31.50131
## 112 31.287358 16.421843 9.298470 cg25758034 31.28736
## 113 19.895459 31.081147 18.341435 cg11247378 31.08115
## 114 30.890997 13.775058 15.847767 cg01680303 30.89100
## 115 30.802689 18.119419 17.160935 cg12784167 30.80269
## 116 21.258584 30.648745 22.223083 cg15535896 30.64875
## 117 30.151766 29.954159 23.131955 cg08198851 30.15177
## 118 24.048430 13.279270 30.009761 cg23916408 30.00976
## 119 22.069094 29.859201 13.147028 cg27577781 29.85920
## 120 23.349590 29.813060 25.382960 cg03327352 29.81306
## 121 29.462696 17.536988 1.464224 cg05321907 29.46270
## 122 29.179307 16.741326 28.448292 cg27452255 29.17931
## 123 29.160140 28.128132 22.982201 cg00322003 29.16014
## 124 29.159431 13.005837 26.484901 PC3 29.15943
## 125 22.903287 28.955661 0.000000 cg12284872 28.95566
## 126 28.701494 25.622556 13.570544 cg21209485 28.70149
## 127 25.291716 26.489500 28.663591 cg26219488 28.66359
## 128 24.105011 28.560061 23.113178 cg27272246 28.56006
## 129 23.245817 28.366537 26.140540 cg19512141 28.36654
## 130 28.204013 20.953862 24.564416 cg26474732 28.20401
## 131 27.445247 20.287962 27.885571 cg03982462 27.88557
## 132 27.807253 25.895347 12.980296 cg11227702 27.80725
## 133 22.227400 27.499933 27.664774 cg20139683 27.66477
## 134 27.617145 13.209045 10.869218 cg08779649 27.61714
## 135 27.478548 11.512734 19.930825 cg01933473 27.47855
## 136 27.413810 25.109085 13.860779 cg09584650 27.41381
## 137 6.648521 27.048875 23.921760 cg07523188 27.04887
## 138 5.627298 21.246600 26.708911 cg06536614 26.70891
## 139 22.843345 13.539292 26.206196 cg17906851 26.20620
## 140 24.771697 22.371546 25.860501 cg27639199 25.86050
## 141 15.595035 20.526671 25.827393 cg07480176 25.82739
## 142 13.652947 25.802567 24.144964 cg00272795 25.80257
## 143 23.164857 25.510611 7.807607 cg05841700 25.51061
## 144 24.694456 25.471377 25.085184 cg06715136 25.47138
## 145 18.083380 12.418059 25.460612 cg08584917 25.46061
## 146 24.893961 23.820587 24.211358 cg25436480 24.89396
## 147 16.585777 23.568968 18.357646 cg05570109 23.56897
## 148 11.873910 23.215783 1.839512 cg03660162 23.21578
## 149 12.070995 22.972798 15.202215 cg16579946 22.97280
## 150 19.252382 16.162452 22.169831 cg07138269 22.16983
## 151 5.336951 21.106379 17.424658 cg11187460 21.10638
## 152 20.968838 2.668884 4.100231 cg00696044 20.96884
## 153 20.696309 17.271723 20.839111 cg08861434 20.83911
## 154 18.659976 17.001843 15.574271 cg10369879 18.65998
## 155 18.201929 2.798671 16.020454 cg24506579 18.20193
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 69.02572 18.177097 100.00000 cg15501526 100.00000
## 2 52.94010 49.706685 76.89645 age.now 76.89645
## 3 36.86960 52.234879 64.42612 cg01153376 64.42612
## 4 58.69625 7.821543 55.30567 cg00962106 58.69625
## 5 18.69484 56.207569 36.56253 cg06864789 56.20757
## 6 25.46018 36.913238 55.20449 cg25259265 55.20449
## 7 25.45909 23.628894 52.63663 cg12012426 52.63663
## 8 33.87455 24.580893 52.04509 cg01013522 52.04509
## 9 22.43338 51.915676 49.54482 cg08857872 51.91568
## 10 31.61148 11.190557 51.26390 cg02494911 51.26390
## 11 50.30424 2.344009 25.60700 cg04412904 50.30424
## 12 30.06630 50.212412 41.00515 cg10985055 50.21241
## 13 49.41441 28.237257 34.88225 cg11133939 49.41441
## 14 40.04418 22.300343 48.67663 cg05234269 48.67663
## 15 18.63456 48.341100 27.81135 cg02356645 48.34110
## 16 42.83734 23.712832 47.53337 cg11438323 47.53337
## 17 30.92640 47.297242 38.54244 cg06112204 47.29724
## 18 20.67566 47.154365 22.75208 cg22274273 47.15437
## 19 23.05891 17.889971 47.07173 cg16771215 47.07173
## 20 46.54664 21.499184 25.59865 cg03088219 46.54664
## [1] "the top 20 features based on max way:"
## [1] "cg15501526" "age.now" "cg01153376" "cg00962106" "cg06864789" "cg25259265" "cg12012426"
## [8] "cg01013522" "cg08857872" "cg02494911" "cg04412904" "cg10985055" "cg11133939" "cg05234269"
## [15] "cg02356645" "cg11438323" "cg06112204" "cg22274273" "cg16771215" "cg03088219"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6964
## The AUC value for class CN is: 0.6963732
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6085
## The AUC value for class Dementia is: 0.6085498
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6561
## The AUC value for class MCI is: 0.6560821
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_rf_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6536684
print(FeatEval_Mean_rf_AUC)
## [1] 0.6536684
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 363, 364, 364, 365
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.7187907 0.5379918
## 0.50 0.7056033 0.5140213
## 1.00 0.7100228 0.5155945
##
## Tuning parameter 'sigma' was held constant at a value of 0.003271835
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003271835 and C = 0.25.
print(svm_model$bestTune)
## sigma C
## 1 0.003271835 0.25
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.7114723
FeatEval_Mean_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Mean_mean_accuracy_cv_svm)
## [1] 0.7114723
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.951648351648352"
FeatEval_Mean_svm_trainAccuracy <- train_accuracy
print(FeatEval_Mean_svm_trainAccuracy)
## [1] 0.9516484
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Mean_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Mean_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 44 7 17
## Dementia 3 16 10
## MCI 19 5 72
##
## Overall Statistics
##
## Accuracy : 0.6839
## 95% CI : (0.6133, 0.7488)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 1.077e-06
##
## Kappa : 0.4755
##
## Mcnemar's Test P-Value : 0.337
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6667 0.5714 0.7273
## Specificity 0.8110 0.9212 0.7447
## Pos Pred Value 0.6471 0.5517 0.7500
## Neg Pred Value 0.8240 0.9268 0.7216
## Prevalence 0.3420 0.1451 0.5130
## Detection Rate 0.2280 0.0829 0.3731
## Detection Prevalence 0.3523 0.1503 0.4974
## Balanced Accuracy 0.7388 0.7463 0.7360
cm_FeatEval_Mean_svm_Accuracy <- cm_FeatEval_Mean_svm$overall["Accuracy"]
cm_FeatEval_Mean_svm_Kappa <- cm_FeatEval_Mean_svm$overall["Kappa"]
print(cm_FeatEval_Mean_svm_Accuracy)
## Accuracy
## 0.6839378
print(cm_FeatEval_Mean_svm_Kappa)
## Kappa
## 0.4754734
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg23432430 1.0481928 1.072289 1.081928 0.1373457
## 2 cg24859648 1.0385542 1.072289 1.091566 0.1373457
## 3 cg15535896 1.0530120 1.072289 1.081928 0.1373457
## 4 cg26948066 1.0192771 1.060241 1.101205 0.1358025
## 5 cg25879395 0.9975904 1.060241 1.091566 0.1358025
## 6 cg14924512 1.0385542 1.060241 1.060241 0.1358025
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) > 66 cases (binary_labels 1).
## Area under the curve: 0.5173
## The AUC value for class CN is: 0.517299
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) > 28 cases (binary_labels 1).
## Area under the curve: 0.5478
## The AUC value for class Dementia is: 0.5478355
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) > 99 cases (binary_labels 1).
## Area under the curve: 0.5609
## The AUC value for class MCI is: 0.5609284
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_svm_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.542021
print(FeatEval_Mean_svm_AUC)
## [1] 0.542021
Performance of the selected output features based on Median
processed_dataFrame<-df_selected_Median
processed_data<-output_median_feature
AfterProcess_FeatureName<-Selected_median_imp_Name
print(head(output_median_feature))
## # A tibble: 6 × 156
## DX PC1 cg00962106 cg16652920 PC3 cg27452255 cg08861434 cg06864789 cg08857872
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI -0.214 0.912 0.944 -0.0140 0.900 0.877 0.0537 0.340
## 2 CN -0.173 0.538 0.943 0.00506 0.659 0.435 0.461 0.818
## 3 CN -0.00367 0.504 0.946 0.0291 0.901 0.870 0.875 0.297
## 4 Dementia -0.187 0.904 0.942 -0.0323 0.890 0.471 0.490 0.295
## 5 MCI 0.0268 0.896 0.953 0.0529 0.578 0.862 0.479 0.894
## 6 CN -0.0379 0.886 0.949 -0.00869 0.881 0.906 0.0542 0.890
## # ℹ 147 more variables: cg07152869 <dbl>, cg09584650 <dbl>, cg16749614 <dbl>, age.now <dbl>,
## # cg05096415 <dbl>, cg23432430 <dbl>, cg01921484 <dbl>, cg02225060 <dbl>, cg02981548 <dbl>,
## # cg14710850 <dbl>, cg19503462 <dbl>, PC2 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>,
## # cg11133939 <dbl>, cg25259265 <dbl>, cg16715186 <dbl>, cg05570109 <dbl>, cg26948066 <dbl>,
## # cg02494911 <dbl>, cg14293999 <dbl>, cg14924512 <dbl>, cg02621446 <dbl>, cg03129555 <dbl>,
## # cg04412904 <dbl>, cg26219488 <dbl>, cg00154902 <dbl>, cg20913114 <dbl>, cg03084184 <dbl>,
## # cg12279734 <dbl>, cg01153376 <dbl>, cg16771215 <dbl>, cg04248279 <dbl>, cg06536614 <dbl>, …
print(Selected_median_imp_Name)
## [1] "PC1" "cg00962106" "cg16652920" "PC3" "cg27452255" "cg08861434" "cg06864789"
## [8] "cg08857872" "cg07152869" "cg09584650" "cg16749614" "age.now" "cg05096415" "cg23432430"
## [15] "cg01921484" "cg02225060" "cg02981548" "cg14710850" "cg19503462" "PC2" "cg17186592"
## [22] "cg00247094" "cg11133939" "cg25259265" "cg16715186" "cg05570109" "cg26948066" "cg02494911"
## [29] "cg14293999" "cg14924512" "cg02621446" "cg03129555" "cg04412904" "cg26219488" "cg00154902"
## [36] "cg20913114" "cg03084184" "cg12279734" "cg01153376" "cg16771215" "cg04248279" "cg06536614"
## [43] "cg09854620" "cg06378561" "cg24859648" "cg10240127" "cg12228670" "cg03327352" "cg12146221"
## [50] "cg03982462" "cg05841700" "cg15865722" "cg07523188" "cg11227702" "cg10369879" "cg16579946"
## [57] "cg24861747" "cg14564293" "cg01128042" "cg00616572" "cg08198851" "cg17421046" "cg15535896"
## [64] "cg18339359" "cg00322003" "cg02372404" "cg11331837" "cg23658987" "cg10738648" "cg25561557"
## [71] "cg01667144" "cg05234269" "cg12534577" "cg06118351" "cg13885788" "cg10750306" "cg15775217"
## [78] "cg01013522" "cg26474732" "cg27086157" "cg03088219" "cg15501526" "cg27577781" "cg11438323"
## [85] "cg06715136" "cg17738613" "cg01680303" "cg06697310" "cg22274273" "cg12738248" "cg21854924"
## [92] "cg14240646" "cg03071582" "cg24873924" "cg17429539" "cg06950937" "cg13080267" "cg27272246"
## [99] "cg27341708" "cg18821122" "cg12682323" "cg12012426" "cg05321907" "cg20139683" "cg20685672"
## [106] "cg26757229" "cg25436480" "cg23916408" "cg20507276" "cg02356645" "cg07028768" "cg00272795"
## [113] "cg25758034" "cg16178271" "cg27639199" "cg11187460" "cg21209485" "cg14527649" "cg23161429"
## [120] "cg19512141" "cg02320265" "cg20370184" "cg12284872" "cg04664583" "cg11247378" "cg26069044"
## [127] "cg25879395" "cg00999469" "cg06112204" "cg02932958" "cg19377607" "cg12784167" "cg07480176"
## [134] "cg00696044" "cg18819889" "cg00689685" "cg00675157" "cg03660162" "cg10985055" "cg07138269"
## [141] "cg21697769" "cg08779649" "cg01933473" "cg17906851" "cg14307563" "cg12776173" "cg24851651"
## [148] "cg08584917" "cg16788319" "cg24506579" "cg01549082" "cg12466610" "cg15633912" "cg01413796"
## [155] "cg20678988"
print(head(df_selected_Median))
## DX PC1 cg00962106 cg16652920 PC3 cg27452255
## 200223270003_R02C01 MCI -0.214185447 0.9124898 0.9436000 -0.014043316 0.9001010
## 200223270003_R03C01 CN -0.172761185 0.5375751 0.9431222 0.005055871 0.6593379
## 200223270003_R06C01 CN -0.003667305 0.5040948 0.9457161 0.029143653 0.9012217
## 200223270003_R07C01 Dementia -0.186779607 0.9039029 0.9419785 -0.032302430 0.8898635
## 200223270006_R01C01 MCI 0.026814649 0.8961556 0.9529417 0.052947950 0.5779792
## 200223270006_R04C01 CN -0.037862929 0.8857597 0.9492648 -0.008685676 0.8809143
## cg08861434 cg06864789 cg08857872 cg07152869 cg09584650 cg16749614 age.now
## 200223270003_R02C01 0.8768306 0.05369415 0.3395280 0.8284151 0.08230254 0.8678741 82.40000
## 200223270003_R03C01 0.4352647 0.46053125 0.8181845 0.5050630 0.09661586 0.8539348 78.60000
## 200223270003_R06C01 0.8698813 0.87513655 0.2970779 0.8352490 0.52399749 0.5874127 80.40000
## 200223270003_R07C01 0.4709249 0.49020327 0.2954090 0.5194300 0.11587211 0.5555391 78.16441
## 200223270006_R01C01 0.8618532 0.47852685 0.8935876 0.5025709 0.42115185 0.8026346 62.90000
## 200223270006_R04C01 0.9058965 0.05423587 0.8901338 0.8080916 0.56043178 0.7903978 80.67796
## cg05096415 cg23432430 cg01921484 cg02225060 cg02981548 cg14710850
## 200223270003_R02C01 0.9182527 0.9482702 0.90985496 0.6828159 0.1342571 0.8048592
## 200223270003_R03C01 0.5177819 0.9455418 0.90931369 0.8265195 0.5220037 0.8090950
## 200223270003_R06C01 0.6288426 0.9418716 0.92044873 0.5209552 0.5098965 0.8285902
## 200223270003_R07C01 0.6060271 0.9426559 0.91674311 0.8078889 0.5660985 0.8336457
## 200223270006_R01C01 0.5599588 0.9461736 0.02943747 0.6084903 0.5678714 0.8500725
## 200223270006_R04C01 0.5441200 0.9508404 0.89057041 0.7638781 0.5079859 0.8207247
## cg19503462 PC2 cg17186592 cg00247094 cg11133939 cg25259265
## 200223270003_R02C01 0.7951675 1.470293e-02 0.9230463 0.5399349 0.1282694 0.4356646
## 200223270003_R03C01 0.4537684 5.745834e-02 0.8593448 0.9315640 0.5920898 0.8893591
## 200223270003_R06C01 0.6997359 8.372861e-02 0.8467599 0.5177874 0.5127706 0.4201700
## 200223270003_R07C01 0.7189778 -1.117250e-02 0.4986373 0.5377765 0.8474176 0.4455517
## 200223270006_R01C01 0.7301755 1.650735e-05 0.8978999 0.9109309 0.8589133 0.8423337
## 200223270006_R04C01 0.4207207 1.571950e-02 0.9239750 0.5266535 0.5246557 0.8460736
## cg16715186 cg05570109 cg26948066 cg02494911 cg14293999 cg14924512
## 200223270003_R02C01 0.2742789 0.3466611 0.4685225 0.3049435 0.2836710 0.5303907
## 200223270003_R03C01 0.7946153 0.5866750 0.5026045 0.2416332 0.9172023 0.9160885
## 200223270003_R06C01 0.8124316 0.4046471 0.9101976 0.2520909 0.9168166 0.9088414
## 200223270003_R07C01 0.7773263 0.6014355 0.9379543 0.2457032 0.9188336 0.9081681
## 200223270006_R01C01 0.8334531 0.5774881 0.9120181 0.8045030 0.1971116 0.9111789
## 200223270006_R04C01 0.8039945 0.8756826 0.8868608 0.7489283 0.9030919 0.5331753
## cg02621446 cg03129555 cg04412904 cg26219488 cg00154902 cg20913114
## 200223270003_R02C01 0.8731313 0.6079616 0.05088595 0.9336638 0.5137741 0.36510482
## 200223270003_R03C01 0.8095534 0.5785498 0.07717659 0.9134707 0.8540746 0.80382984
## 200223270003_R06C01 0.7511582 0.9137818 0.08253743 0.9261878 0.8188126 0.03158439
## 200223270003_R07C01 0.8773609 0.9043041 0.06217431 0.9217866 0.4625776 0.81256840
## 200223270006_R01C01 0.2046541 0.9286357 0.11888769 0.4929692 0.4690086 0.81502059
## 200223270006_R04C01 0.7963817 0.9088564 0.08885846 0.9431574 0.4547219 0.90468830
## cg03084184 cg12279734 cg01153376 cg16771215 cg04248279 cg06536614
## 200223270003_R02C01 0.8162981 0.6435368 0.4872148 0.88389723 0.8534976 0.5824474
## 200223270003_R03C01 0.7877128 0.1494651 0.9639670 0.07196933 0.8458854 0.5746694
## 200223270003_R06C01 0.4546397 0.8760759 0.2242410 0.09949974 0.8332786 0.5773468
## 200223270003_R07C01 0.7812413 0.8674214 0.5155654 0.64234023 0.3303204 0.5848917
## 200223270006_R01C01 0.7818230 0.6454450 0.9588916 0.62679274 0.5966878 0.5669919
## 200223270006_R04C01 0.7725853 0.8660058 0.9586876 0.06970175 0.8939599 0.5718514
## cg09854620 cg06378561 cg24859648 cg10240127 cg12228670 cg03327352
## 200223270003_R02C01 0.5220587 0.9389306 0.83777536 0.9250553 0.8632174 0.8851712
## 200223270003_R03C01 0.8739646 0.9377503 0.44392797 0.9403255 0.8496212 0.8786878
## 200223270003_R06C01 0.8973149 0.5154019 0.03341185 0.9056974 0.8738949 0.3042310
## 200223270003_R07C01 0.8958863 0.9403569 0.43582347 0.9396217 0.8362189 0.8273211
## 200223270006_R01C01 0.9075331 0.4956816 0.03087161 0.9262370 0.8079694 0.8774082
## 200223270006_R04C01 0.9318820 0.9268832 0.02588024 0.9240497 0.6966666 0.8829492
## cg12146221 cg03982462 cg05841700 cg15865722 cg07523188 cg11227702
## 200223270003_R02C01 0.2049284 0.8562777 0.2923544 0.89438595 0.7509183 0.86486075
## 200223270003_R03C01 0.1814927 0.6023731 0.9146488 0.90194372 0.1524386 0.49184121
## 200223270003_R06C01 0.8619250 0.8778458 0.3737990 0.92118977 0.7127592 0.02543724
## 200223270003_R07C01 0.1238469 0.8860227 0.5046468 0.09230759 0.8464983 0.45150971
## 200223270006_R01C01 0.2021598 0.8703107 0.8419031 0.93422668 0.7847738 0.89086877
## 200223270006_R04C01 0.1383786 0.8792860 0.9286652 0.92220002 0.8231277 0.87675947
## cg10369879 cg16579946 cg24861747 cg14564293 cg01128042 cg00616572
## 200223270003_R02C01 0.9218784 0.6306315 0.3540897 0.52089591 0.9113420 0.9335067
## 200223270003_R03C01 0.3149306 0.6648766 0.4309505 0.04000662 0.5328806 0.9214079
## 200223270003_R06C01 0.9141081 0.6455081 0.8071462 0.04959460 0.5222757 0.9113633
## 200223270003_R07C01 0.9054415 0.8979650 0.3347317 0.03114773 0.5141721 0.9160238
## 200223270006_R01C01 0.2917862 0.6886498 0.3544795 0.51703196 0.9321215 0.4861334
## 200223270006_R04C01 0.9200403 0.6766907 0.5997840 0.51535010 0.5050081 0.9067928
## cg08198851 cg17421046 cg15535896 cg18339359 cg00322003 cg02372404
## 200223270003_R02C01 0.6578905 0.9026993 0.3382952 0.8824858 0.1759911 0.03598249
## 200223270003_R03C01 0.6578186 0.9112100 0.9253926 0.9040272 0.5702070 0.02767285
## 200223270003_R06C01 0.1272153 0.8952031 0.3320191 0.8552121 0.3077122 0.03127855
## 200223270003_R07C01 0.8351465 0.9268852 0.9409104 0.3073106 0.6104341 0.55685785
## 200223270006_R01C01 0.8791156 0.1118337 0.9326027 0.8973742 0.6147419 0.02587736
## 200223270006_R04C01 0.1423737 0.4174370 0.9156401 0.2292800 0.2293759 0.02828648
## cg11331837 cg23658987 cg10738648 cg25561557 cg01667144 cg05234269
## 200223270003_R02C01 0.03692842 0.79757644 0.44931577 0.76736369 0.8971484 0.93848584
## 200223270003_R03C01 0.57150125 0.07511718 0.49894016 0.03851635 0.3175389 0.57461229
## 200223270003_R06C01 0.03182862 0.10177571 0.05552024 0.47259480 0.9238364 0.02467208
## 200223270003_R07C01 0.03832164 0.46747992 0.03730440 0.43364249 0.8739442 0.56516794
## 200223270006_R01C01 0.93008298 0.76831297 0.54952781 0.46211439 0.2931961 0.94829529
## 200223270006_R04C01 0.54004452 0.08988532 0.59358167 0.44651530 0.8616530 0.56298286
## cg12534577 cg06118351 cg13885788 cg10750306 cg15775217 cg01013522
## 200223270003_R02C01 0.8585231 0.36339400 0.9380618 0.04919915 0.5707441 0.6251168
## 200223270003_R03C01 0.8493466 0.47148604 0.9369476 0.55160081 0.9168327 0.8862821
## 200223270003_R06C01 0.8395241 0.86559618 0.5163017 0.54694332 0.6042521 0.5425308
## 200223270003_R07C01 0.8511384 0.83494303 0.9183376 0.59824543 0.9062231 0.8429862
## 200223270006_R01C01 0.8804655 0.02632111 0.5525542 0.53158639 0.9083515 0.0480531
## 200223270006_R04C01 0.3029013 0.83329300 0.9328289 0.05646838 0.6383270 0.8240222
## cg26474732 cg27086157 cg03088219 cg15501526 cg27577781 cg11438323
## 200223270003_R02C01 0.7843252 0.9224112 0.844002862 0.6362531 0.8143535 0.4863471
## 200223270003_R03C01 0.8184088 0.9219304 0.007435243 0.6319253 0.8113185 0.8984559
## 200223270003_R06C01 0.7358417 0.3224986 0.120155222 0.7435100 0.8144274 0.8722772
## 200223270003_R07C01 0.7509296 0.3455486 0.826554308 0.7756577 0.7970617 0.5026756
## 200223270006_R01C01 0.8294938 0.8988962 0.066294915 0.3230777 0.8640044 0.8809646
## 200223270006_R04C01 0.8033167 0.9159217 0.574738383 0.8342695 0.8840237 0.8717937
## cg06715136 cg17738613 cg01680303 cg06697310 cg22274273 cg12738248
## 200223270003_R02C01 0.3400192 0.6879612 0.5095174 0.8454609 0.4209386 0.85430866
## 200223270003_R03C01 0.9259109 0.6582258 0.1344941 0.8653044 0.4246379 0.88010292
## 200223270003_R06C01 0.9079807 0.1022257 0.7573869 0.2405168 0.4196796 0.51121855
## 200223270003_R07C01 0.6782105 0.8960156 0.4772204 0.8479193 0.4164100 0.09131476
## 200223270006_R01C01 0.8369052 0.8850702 0.1176263 0.8206613 0.7951105 0.91529345
## 200223270006_R04C01 0.8807568 0.8481916 0.5133033 0.7839595 0.0229810 0.91911405
## cg21854924 cg14240646 cg03071582 cg24873924 cg17429539 cg06950937
## 200223270003_R02C01 0.8729132 0.5391334 0.9187811 0.3060635 0.7860900 0.8910968
## 200223270003_R03C01 0.7162342 0.2538363 0.5844421 0.8640985 0.7100923 0.2889345
## 200223270003_R06C01 0.7520990 0.1864902 0.6245558 0.8259149 0.7660838 0.9143801
## 200223270003_R07C01 0.8641284 0.6402007 0.9283683 0.8333940 0.6984969 0.8891079
## 200223270006_R01C01 0.6498895 0.7696079 0.5715416 0.8761177 0.6508597 0.8868617
## 200223270006_R04C01 0.5943113 0.1490028 0.6534650 0.8585363 0.2828452 0.9093273
## cg13080267 cg27272246 cg27341708 cg18821122 cg12682323 cg12012426
## 200223270003_R02C01 0.78936656 0.8615873 0.48846610 0.9291309 0.9397956 0.9165048
## 200223270003_R03C01 0.78371483 0.8705287 0.02613847 0.5901603 0.9003940 0.9434768
## 200223270003_R06C01 0.09436069 0.8103777 0.86893582 0.5779620 0.9157877 0.9220044
## 200223270003_R07C01 0.09351259 0.0310881 0.02642300 0.9251431 0.9048877 0.9241284
## 200223270006_R01C01 0.45173796 0.7686536 0.47573455 0.9217018 0.1065347 0.9327143
## 200223270006_R04C01 0.49866715 0.4403542 0.89411974 0.5412250 0.8836232 0.9271167
## cg05321907 cg20139683 cg20685672 cg26757229 cg25436480 cg23916408
## 200223270003_R02C01 0.2880477 0.8717075 0.67121006 0.6723726 0.84251599 0.1942275
## 200223270003_R03C01 0.1782629 0.9059433 0.79320906 0.1422661 0.49940321 0.9154993
## 200223270003_R06C01 0.8427929 0.8962554 0.66136456 0.7933794 0.34943119 0.8886255
## 200223270003_R07C01 0.8320504 0.9218012 0.80838304 0.8074830 0.85244913 0.8872447
## 200223270006_R01C01 0.2422218 0.1708472 0.08291414 0.5265692 0.44545117 0.2219945
## 200223270006_R04C01 0.2429551 0.1067122 0.84460055 0.7341953 0.02575036 0.1520624
## cg20507276 cg02356645 cg07028768 cg00272795 cg25758034 cg16178271
## 200223270003_R02C01 0.12238910 0.5105903 0.4496851 0.46365138 0.6114028 0.6445416
## 200223270003_R03C01 0.38721972 0.5833923 0.8536078 0.82839260 0.6649219 0.6178075
## 200223270003_R06C01 0.47978438 0.5701428 0.8356936 0.07231279 0.2393844 0.6641952
## 200223270003_R07C01 0.02261996 0.5683381 0.4245893 0.78303831 0.7071501 0.7148058
## 200223270006_R01C01 0.37465798 0.5233692 0.8835151 0.78219952 0.2301078 0.6138954
## 200223270006_R04C01 0.03570795 0.9188670 0.4514661 0.44408249 0.6891513 0.9414188
## cg27639199 cg11187460 cg21209485 cg14527649 cg23161429 cg19512141
## 200223270003_R02C01 0.67515415 0.03672179 0.8865053 0.2678912 0.8956965 0.8209161
## 200223270003_R03C01 0.67552763 0.92516409 0.8714878 0.7954683 0.9099619 0.7903543
## 200223270003_R06C01 0.06233093 0.03109553 0.2292550 0.8350610 0.8833895 0.8404684
## 200223270003_R07C01 0.05701332 0.53283119 0.2351526 0.8428684 0.9134709 0.2202759
## 200223270006_R01C01 0.05037694 0.54038146 0.8882046 0.8231348 0.8738558 0.8059589
## 200223270006_R04C01 0.08144161 0.91096169 0.2292483 0.8022444 0.9104210 0.7020247
## cg02320265 cg20370184 cg12284872 cg04664583 cg11247378 cg26069044
## 200223270003_R02C01 0.8853213 0.37710950 0.8008333 0.5572814 0.1591185 0.92401867
## 200223270003_R03C01 0.4686314 0.05737964 0.7414569 0.5881190 0.7874849 0.94072227
## 200223270003_R06C01 0.4838749 0.04740505 0.7725267 0.9352717 0.4807942 0.93321315
## 200223270003_R07C01 0.8986848 0.83572095 0.7573369 0.9350230 0.4537348 0.56567694
## 200223270006_R01C01 0.8987560 0.04056608 0.7201607 0.9424588 0.1537079 0.94369927
## 200223270006_R04C01 0.4768520 0.04038589 0.8021446 0.9379537 0.1686356 0.02040391
## cg25879395 cg00999469 cg06112204 cg02932958 cg19377607 cg12784167
## 200223270003_R02C01 0.88130864 0.3274080 0.5251592 0.7901008 0.05377464 0.81503498
## 200223270003_R03C01 0.02603438 0.2857719 0.8773488 0.4210489 0.90570746 0.02811410
## 200223270003_R06C01 0.91060615 0.2499229 0.8867975 0.3825995 0.06636174 0.03073269
## 200223270003_R07C01 0.89205942 0.2819622 0.5613799 0.7617081 0.68788639 0.84775699
## 200223270006_R01C01 0.47886249 0.2933539 0.9184122 0.8431126 0.06338988 0.83825789
## 200223270006_R04C01 0.02145248 0.2966623 0.9152514 0.7610084 0.91551446 0.45475291
## cg07480176 cg00696044 cg18819889 cg00689685 cg00675157 cg03660162
## 200223270003_R02C01 0.5171664 0.55608424 0.9156157 0.7019389 0.9188438 0.8691767
## 200223270003_R03C01 0.3760452 0.07552381 0.9004455 0.8634268 0.9242325 0.5160770
## 200223270003_R06C01 0.6998389 0.79270858 0.9054439 0.6378795 0.9254708 0.9026304
## 200223270003_R07C01 0.2189042 0.03548419 0.9089935 0.8624541 0.5447244 0.5305691
## 200223270006_R01C01 0.5570021 0.10714386 0.9065397 0.6361891 0.5173554 0.9257451
## 200223270006_R04C01 0.4501196 0.18420803 0.9242767 0.6356260 0.9247232 0.8935772
## cg10985055 cg07138269 cg21697769 cg08779649 cg01933473 cg17906851
## 200223270003_R02C01 0.8518169 0.5002290 0.8946108 0.44449401 0.2589014 0.9488392
## 200223270003_R03C01 0.8631895 0.9426707 0.2822953 0.45076825 0.6726133 0.9529718
## 200223270003_R06C01 0.5456633 0.5057781 0.8698740 0.04810217 0.2642560 0.6462151
## 200223270003_R07C01 0.8825100 0.9400527 0.9134887 0.42715969 0.1978068 0.9553497
## 200223270006_R01C01 0.8841690 0.9321602 0.2683820 0.89313476 0.7599441 0.6222117
## 200223270006_R04C01 0.8407797 0.9333501 0.2765740 0.59523771 0.7405661 0.6441202
## cg14307563 cg12776173 cg24851651 cg08584917 cg16788319 cg24506579
## 200223270003_R02C01 0.1855966 0.10388038 0.03674702 0.5663205 0.9379870 0.5244337
## 200223270003_R03C01 0.8916957 0.87306345 0.05358297 0.9019732 0.8913429 0.5794845
## 200223270003_R06C01 0.8750052 0.70094907 0.05968923 0.9187789 0.8680680 0.9427785
## 200223270003_R07C01 0.8975663 0.11367159 0.60864179 0.6007449 0.8811444 0.9323844
## 200223270006_R01C01 0.8762842 0.09458405 0.08825834 0.9069098 0.3123481 0.9185355
## 200223270006_R04C01 0.9168614 0.86532175 0.91932068 0.9263584 0.2995627 0.4332642
## cg01549082 cg12466610 cg15633912 cg01413796 cg20678988
## 200223270003_R02C01 0.2924138 0.05767659 0.1605530 0.1345128 0.8438718
## 200223270003_R03C01 0.7065693 0.59131778 0.9333421 0.2830672 0.8548886
## 200223270003_R06C01 0.2895440 0.06939623 0.8737362 0.8194681 0.7786685
## 200223270003_R07C01 0.6422955 0.04527733 0.9137334 0.9007710 0.8260541
## 200223270006_R01C01 0.8471236 0.05212904 0.9169706 0.2603027 0.3295384
## 200223270006_R04C01 0.6949888 0.05104033 0.8890004 0.9207672 0.8541667
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Median_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Median_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 46 7 14
## Dementia 3 10 4
## MCI 17 11 81
##
## Overall Statistics
##
## Accuracy : 0.7098
## 95% CI : (0.6403, 0.7728)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.018e-08
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 0.1607
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6970 0.35714 0.8182
## Specificity 0.8346 0.95758 0.7021
## Pos Pred Value 0.6866 0.58824 0.7431
## Neg Pred Value 0.8413 0.89773 0.7857
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2383 0.05181 0.4197
## Detection Prevalence 0.3472 0.08808 0.5648
## Balanced Accuracy 0.7658 0.65736 0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Median_LRM1_Accuracy <- cm_FeatEval_Median_LRM1$overall["Accuracy"]
cm_FeatEval_Median_LRM1_Kappa <- cm_FeatEval_Median_LRM1$overall["Kappa"]
print(cm_FeatEval_Median_LRM1_Accuracy)
## Accuracy
## 0.7098446
print(cm_FeatEval_Median_LRM1_Kappa)
## Kappa
## 0.4987013
print(model_LRM1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001810831 0.6350263 0.3962356
## 0.10 0.0018108309 0.6460636 0.4102125
## 0.10 0.0181083090 0.6548792 0.4144240
## 0.55 0.0001810831 0.6263550 0.3765308
## 0.55 0.0018108309 0.6505792 0.4121576
## 0.55 0.0181083090 0.6483336 0.3870111
## 1.00 0.0001810831 0.6065010 0.3457739
## 1.00 0.0018108309 0.6394930 0.3907984
## 1.00 0.0181083090 0.5867925 0.2663062
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Median_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.96043956043956"
print(FeatEval_Median_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.6326693
FeatEval_Median_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Median_mean_accuracy_cv_LRM1)
## [1] 0.6326693
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.831
## The AUC value for class Dementia is: 0.8309524
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.819
## The AUC value for class MCI is: 0.8190415
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_LRM1_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8329058
print(FeatEval_Median_LRM1_AUC)
## [1] 0.8329058
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 90.425 1.000e+02 0.000
## PC2 46.685 7.872e+01 0.000
## PC3 5.951 0.000e+00 68.294
## cg00962106 63.062 1.183e+01 36.941
## cg02225060 23.025 1.263e+01 51.150
## cg14710850 49.622 8.389e+00 25.399
## cg27452255 49.062 1.787e+01 11.822
## cg02981548 26.231 5.636e+00 49.023
## cg08861434 48.675 0.000e+00 42.758
## cg19503462 25.904 4.810e+01 5.776
## cg07152869 27.981 4.673e+01 1.351
## cg16749614 11.548 1.797e+01 45.950
## cg05096415 1.408 4.492e+01 28.936
## cg23432430 44.232 3.494e+00 25.269
## cg17186592 3.085 4.200e+01 26.692
## cg00247094 15.876 4.167e+01 10.433
## cg09584650 41.425 6.526e+00 18.542
## cg11133939 24.211 4.538e-03 40.491
## cg16715186 39.196 7.692e+00 17.052
## cg03129555 12.445 3.860e+01 8.423
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 90.4249959 1.000000e+02 0.00000000 PC1 100.0000000
## 2 46.6852511 7.871687e+01 0.00000000 PC2 78.7168695
## 3 5.9514284 0.000000e+00 68.29377031 PC3 68.2937703
## 4 63.0624779 1.183060e+01 36.94103318 cg00962106 63.0624779
## 5 23.0253134 1.262791e+01 51.15044480 cg02225060 51.1504448
## 6 49.6219760 8.389275e+00 25.39916547 cg14710850 49.6219760
## 7 49.0621672 1.787253e+01 11.82240946 cg27452255 49.0621672
## 8 26.2311272 5.636412e+00 49.02310927 cg02981548 49.0231093
## 9 48.6751550 0.000000e+00 42.75848513 cg08861434 48.6751550
## 10 25.9044959 4.810083e+01 5.77582047 cg19503462 48.1008290
## 11 27.9805710 4.672807e+01 1.35109756 cg07152869 46.7280747
## 12 11.5479104 1.797038e+01 45.95027254 cg16749614 45.9502725
## 13 1.4082456 4.491590e+01 28.93584840 cg05096415 44.9158984
## 14 44.2323996 3.493809e+00 25.26937151 cg23432430 44.2323996
## 15 3.0847630 4.200475e+01 26.69196564 cg17186592 42.0047521
## 16 15.8757027 4.166882e+01 10.43251038 cg00247094 41.6688243
## 17 41.4246158 6.526166e+00 18.54231378 cg09584650 41.4246158
## 18 24.2106288 4.537710e-03 40.49102060 cg11133939 40.4910206
## 19 39.1961911 7.691869e+00 17.05238456 cg16715186 39.1961911
## 20 12.4453101 3.860078e+01 8.42261474 cg03129555 38.6007810
## 21 3.1873741 2.010300e+01 38.48198122 cg08857872 38.4819812
## 22 12.1355897 3.683036e+01 11.12168078 cg06864789 36.8303605
## 23 0.0000000 3.529451e+01 26.74302862 cg14924512 35.2945137
## 24 7.2093862 1.187674e+01 34.91993102 cg16652920 34.9199310
## 25 19.1196758 3.458789e+01 0.00000000 cg03084184 34.5878874
## 26 3.6622134 1.336219e+01 34.16757960 cg26219488 34.1675796
## 27 13.4786533 3.380525e+01 6.06343454 cg20913114 33.8052491
## 28 7.1379682 3.346823e+01 11.81330708 cg06378561 33.4682333
## 29 33.3195658 1.548121e+01 2.09663452 cg26948066 33.3195658
## 30 0.5659275 3.328737e+01 17.47322453 cg25259265 33.2873719
## 31 33.2741078 0.000000e+00 21.54049071 cg06536614 33.2741078
## 32 1.6389436 3.231544e+01 17.24929833 cg24859648 32.3154384
## 33 12.7583748 3.077107e+01 2.19901369 cg12279734 30.7710702
## 34 30.6869142 1.115687e+01 2.49273174 cg03982462 30.6869142
## 35 1.2161924 3.061483e+01 16.61061643 cg05841700 30.6148318
## 36 29.8384323 7.653346e+00 7.72518875 cg11227702 29.8384323
## 37 25.3670810 0.000000e+00 29.02265454 cg12146221 29.0226545
## 38 9.6440880 8.953102e+00 28.93661162 cg02621446 28.9366116
## 39 0.0000000 2.259029e+01 28.84500791 cg00616572 28.8450079
## 40 28.4448781 8.986950e+00 6.53998672 cg15535896 28.4448781
## 41 25.4389044 0.000000e+00 28.23931149 cg02372404 28.2393115
## 42 5.0575037 2.778766e+01 8.14605297 cg09854620 27.7876601
## 43 27.5958249 0.000000e+00 15.87407881 cg04248279 27.5958249
## 44 4.0039872 7.702389e+00 27.54323251 cg20678988 27.5432325
## 45 0.0000000 2.751513e+01 13.83386590 cg24861747 27.5151349
## 46 27.4742117 1.566503e+01 0.00000000 cg10240127 27.4742117
## 47 7.7737177 7.231522e+00 27.22250002 cg16771215 27.2225000
## 48 0.6492323 2.697669e+01 14.65073407 cg01667144 26.9766923
## 49 26.9399272 8.941344e+00 2.81296798 cg13080267 26.9399272
## 50 0.0000000 2.614333e+01 26.59118756 cg02494911 26.5911876
## 51 9.3803376 2.645868e+01 5.12832350 cg10750306 26.4586835
## 52 25.4634769 1.206253e+00 11.27077718 cg11438323 25.4634769
## 53 4.8762323 4.031203e+00 25.42783468 cg06715136 25.4278347
## 54 25.1331048 0.000000e+00 15.36655565 cg04412904 25.1331048
## 55 4.7684737 2.484828e+01 5.39878099 cg12738248 24.8482839
## 56 24.4026477 0.000000e+00 18.67996447 cg03071582 24.4026477
## 57 0.0000000 2.429592e+01 15.80574961 cg05570109 24.2959228
## 58 24.2220488 2.028670e+01 0.00000000 cg15775217 24.2220488
## 59 0.0000000 1.995221e+01 24.19626839 cg24873924 24.1962684
## 60 7.5586975 4.158452e+00 24.13453309 cg17738613 24.1345331
## 61 23.8194963 0.000000e+00 20.82012346 cg01921484 23.8194963
## 62 0.0000000 1.631677e+01 23.68025333 cg10369879 23.6802533
## 63 0.0000000 1.839911e+01 23.65030836 cg27341708 23.6503084
## 64 0.0000000 2.355460e+01 21.43182704 cg12534577 23.5546047
## 65 0.0000000 2.341380e+01 17.83628431 cg18821122 23.4137998
## 66 4.6163643 6.919010e+00 23.35074707 cg12682323 23.3507471
## 67 23.3205645 0.000000e+00 14.18835420 cg05234269 23.3205645
## 68 23.0340417 0.000000e+00 22.81275180 cg20685672 23.0340417
## 69 20.3877963 0.000000e+00 22.84051946 cg12228670 22.8405195
## 70 22.7103929 3.661257e+00 8.33658882 cg11331837 22.7103929
## 71 0.0000000 2.269135e+01 20.85644106 cg01680303 22.6913512
## 72 22.4135654 1.167086e+00 10.22120185 cg17421046 22.4135654
## 73 22.2717800 8.042375e+00 2.25958376 cg03088219 22.2717800
## 74 22.2642367 1.930200e+01 0.00000000 cg00322003 22.2642367
## 75 22.2444520 1.528002e+01 0.00000000 cg02356645 22.2444520
## 76 5.8948426 2.207822e+01 1.26185244 cg01013522 22.0782243
## 77 12.6157774 0.000000e+00 21.83055417 cg00272795 21.8305542
## 78 21.6475067 0.000000e+00 14.53413956 cg25758034 21.6475067
## 79 4.7766387 2.163354e+01 1.18820728 cg26474732 21.6335393
## 80 0.0000000 2.128988e+01 17.62871803 cg16579946 21.2898785
## 81 21.2110158 4.532875e+00 5.64862793 cg11187460 21.2110158
## 82 9.6192362 2.120815e+01 0.00000000 cg07523188 21.2081474
## 83 0.0000000 1.703337e+01 20.79581060 cg14527649 20.7958106
## 84 2.7306253 4.862647e+00 20.54616679 cg20370184 20.5461668
## 85 20.5342917 0.000000e+00 13.71034162 cg17429539 20.5342917
## 86 0.0000000 2.028684e+01 10.03089297 cg20507276 20.2868418
## 87 1.1826772 6.821757e+00 20.19751808 cg13885788 20.1975181
## 88 0.0000000 1.557720e+01 20.08673687 cg16178271 20.0867369
## 89 5.5921343 1.527010e+00 19.98502402 cg10738648 19.9850240
## 90 5.1468761 1.991958e+01 2.75466494 cg26069044 19.9195759
## 91 3.1971419 4.954795e+00 19.79319869 cg25879395 19.7931987
## 92 19.6440926 0.000000e+00 12.12328511 cg06112204 19.6440926
## 93 3.2284298 1.923166e+01 1.27197851 cg23161429 19.2316573
## 94 19.0437246 0.000000e+00 8.87119081 cg25436480 19.0437246
## 95 18.8943565 1.898245e+01 0.00000000 cg26757229 18.9824479
## 96 18.8530531 8.147457e+00 0.00000000 cg02932958 18.8530531
## 97 6.3452415 1.863514e+01 0.95794695 cg18339359 18.6351383
## 98 18.5829115 1.503090e+00 1.89843568 cg06950937 18.5829115
## 99 12.0369279 1.857703e+01 0.00000000 cg23916408 18.5770326
## 100 1.5243862 3.188027e+00 18.16777827 cg12784167 18.1677783
## 101 11.8999547 0.000000e+00 18.13677901 cg07480176 18.1367790
## 102 0.0000000 5.486660e+00 17.70082105 cg15865722 17.7008211
## 103 17.6735632 0.000000e+00 13.05289022 cg27577781 17.6735632
## 104 17.1592611 2.949446e+00 2.52035333 cg05321907 17.1592611
## 105 16.8576564 0.000000e+00 7.58718214 cg03660162 16.8576564
## 106 16.7701657 0.000000e+00 9.90995624 cg07138269 16.7701657
## 107 16.7249896 6.150798e-04 5.47347172 cg20139683 16.7249896
## 108 1.5108482 1.661274e+01 3.60041267 cg12284872 16.6127427
## 109 16.5452643 0.000000e+00 15.32420554 cg03327352 16.5452643
## 110 0.0000000 1.652720e+01 12.91758315 cg23658987 16.5272039
## 111 0.0000000 1.475029e+01 16.17924038 cg21854924 16.1792404
## 112 15.7844996 0.000000e+00 6.83817946 cg21697769 15.7844996
## 113 15.6692581 5.743518e+00 0.00000000 cg19512141 15.6692581
## 114 10.3234169 0.000000e+00 15.46946157 cg08198851 15.4694616
## 115 0.4310200 1.509647e+01 0.82976761 cg00675157 15.0964673
## 116 0.0000000 5.686212e+00 15.01391492 cg01153376 15.0139149
## 117 1.8017259 1.495919e+01 0.76736950 cg01933473 14.9591899
## 118 14.9059930 0.000000e+00 4.57584668 cg12776173 14.9059930
## 119 0.0000000 1.067547e+01 14.71662168 cg14564293 14.7166217
## 120 12.4069328 0.000000e+00 14.57808091 cg24851651 14.5780809
## 121 0.0000000 1.452148e+01 2.25091078 cg22274273 14.5214828
## 122 12.7834657 1.451780e+01 0.00000000 cg25561557 14.5177981
## 123 13.7922467 1.439133e+01 0.00000000 cg21209485 14.3913274
## 124 3.8865187 1.430786e+01 0.00000000 cg10985055 14.3078613
## 125 8.0980649 0.000000e+00 14.23052412 cg14293999 14.2305241
## 126 0.0000000 6.088317e+00 13.97288400 cg18819889 13.9728840
## 127 7.9101725 1.390343e+01 0.00000000 cg24506579 13.9034342
## 128 10.4921193 0.000000e+00 13.81747243 cg19377607 13.8174724
## 129 2.6292166 1.360085e+01 0.00000000 cg06697310 13.6008462
## 130 13.5762704 0.000000e+00 10.14995156 cg00696044 13.5762704
## 131 0.0000000 0.000000e+00 13.11702130 cg01549082 13.1170213
## 132 0.0000000 6.905304e+00 13.06761511 cg01128042 13.0676151
## 133 0.2698545 1.248616e+01 1.16140010 cg00999469 12.4861632
## 134 0.0000000 1.079484e+01 12.40421487 cg06118351 12.4042149
## 135 0.0000000 1.124273e+01 11.78794942 cg12012426 11.7879494
## 136 11.7349496 9.445909e+00 0.00000000 cg08584917 11.7349496
## 137 0.0000000 1.168263e+01 2.24819851 cg15633912 11.6826262
## 138 11.6820808 0.000000e+00 11.18100448 cg27272246 11.6820808
## 139 11.3463436 1.979374e+00 0.00000000 cg17906851 11.3463436
## 140 1.2024457 1.133501e+01 0.00000000 cg16788319 11.3350060
## 141 8.9965158 0.000000e+00 11.28265946 cg07028768 11.2826595
## 142 0.0000000 3.118110e+00 10.75375455 cg27086157 10.7537545
## 143 1.8118794 9.619556e+00 0.00000000 cg14240646 9.6195558
## 144 0.0000000 9.458924e+00 9.21780985 cg00154902 9.4589241
## 145 6.6601294 0.000000e+00 9.11035233 cg14307563 9.1103523
## 146 0.0000000 8.513866e+00 0.00000000 cg02320265 8.5138660
## 147 8.2069811 0.000000e+00 7.04448942 cg08779649 8.2069811
## 148 7.6741533 0.000000e+00 7.94681309 cg04664583 7.9468131
## 149 0.0000000 0.000000e+00 6.60014051 cg12466610 6.6001405
## 150 6.2362459 3.714491e+00 0.00000000 cg27639199 6.2362459
## 151 0.0000000 0.000000e+00 5.82266885 cg15501526 5.8226689
## 152 0.0000000 4.835409e+00 3.66050252 cg00689685 4.8354086
## 153 2.8005491 0.000000e+00 0.07693353 cg01413796 2.8005491
## 154 0.0000000 0.000000e+00 2.13030107 cg11247378 2.1303011
## 155 0.5215519 0.000000e+00 0.63597308 age.now 0.6359731
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 90.424996 100.00000000 0.000000 PC1 100.00000
## 2 46.685251 78.71686950 0.000000 PC2 78.71687
## 3 5.951428 0.00000000 68.293770 PC3 68.29377
## 4 63.062478 11.83059646 36.941033 cg00962106 63.06248
## 5 23.025313 12.62790827 51.150445 cg02225060 51.15044
## 6 49.621976 8.38927495 25.399165 cg14710850 49.62198
## 7 49.062167 17.87253273 11.822409 cg27452255 49.06217
## 8 26.231127 5.63641231 49.023109 cg02981548 49.02311
## 9 48.675155 0.00000000 42.758485 cg08861434 48.67516
## 10 25.904496 48.10082896 5.775820 cg19503462 48.10083
## 11 27.980571 46.72807471 1.351098 cg07152869 46.72807
## 12 11.547910 17.97037830 45.950273 cg16749614 45.95027
## 13 1.408246 44.91589839 28.935848 cg05096415 44.91590
## 14 44.232400 3.49380894 25.269372 cg23432430 44.23240
## 15 3.084763 42.00475211 26.691966 cg17186592 42.00475
## 16 15.875703 41.66882434 10.432510 cg00247094 41.66882
## 17 41.424616 6.52616573 18.542314 cg09584650 41.42462
## 18 24.210629 0.00453771 40.491021 cg11133939 40.49102
## 19 39.196191 7.69186882 17.052385 cg16715186 39.19619
## 20 12.445310 38.60078097 8.422615 cg03129555 38.60078
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "PC3" "cg00962106" "cg02225060" "cg14710850" "cg27452255"
## [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
table(df_LRM1$DX)
##
## CN Dementia MCI
## 221 94 333
prop.table(table(df_LRM1$DX))
##
## CN Dementia MCI
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
##
## CN Dementia MCI
## 155 66 234
prop.table(table(trainData$DX))
##
## CN Dementia MCI
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 3.542553
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 3.545455Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 132.4, df = 2, p-value < 2.2e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 93.156, df = 2, p-value < 2.2e-16library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN Dementia MCI
## 155 132 234
dim(balanced_data_LGR_1)
## [1] 521 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 6 15
## Dementia 4 11 6
## MCI 17 11 78
##
## Overall Statistics
##
## Accuracy : 0.6943
## 95% CI : (0.6241, 0.7584)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.356e-07
##
## Kappa : 0.4779
##
## Mcnemar's Test P-Value : 0.5733
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.39286 0.7879
## Specificity 0.8346 0.93939 0.7021
## Pos Pred Value 0.6818 0.52381 0.7358
## Neg Pred Value 0.8346 0.90116 0.7586
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.05699 0.4041
## Detection Prevalence 0.3420 0.10881 0.5492
## Balanced Accuracy 0.7582 0.66613 0.7450
print(model_LRM2)
## glmnet
##
## 521 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 416, 417, 417, 417, 417
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.000186946 0.7103114 0.5552305
## 0.10 0.001869460 0.7121978 0.5563269
## 0.10 0.018694597 0.7160989 0.5621857
## 0.55 0.000186946 0.7007143 0.5397047
## 0.55 0.001869460 0.7102930 0.5525186
## 0.55 0.018694597 0.6872894 0.5142517
## 1.00 0.000186946 0.6834432 0.5136505
## 1.00 0.001869460 0.7006777 0.5383593
## 1.00 0.018694597 0.6468864 0.4489232
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6964347
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 80.679 100.000 0.000
## PC2 38.884 80.691 0.000
## cg00962106 56.198 9.095 33.495
## PC3 7.501 0.000 55.870
## cg19503462 26.315 48.639 6.536
## cg27452255 47.903 21.179 8.088
## cg07152869 27.968 45.986 1.294
## cg05096415 3.337 45.587 28.318
## cg02225060 18.272 12.775 45.585
## cg14710850 45.324 8.651 21.701
## cg02981548 23.097 5.920 45.302
## cg08861434 44.863 0.000 36.602
## cg03129555 14.450 42.015 10.562
## cg23432430 41.988 6.875 20.297
## cg16749614 8.925 17.010 41.737
## cg17186592 3.590 40.130 25.168
## cg14924512 1.855 38.982 23.221
## cg09584650 38.240 7.576 15.082
## cg06864789 13.558 38.083 11.893
## cg03084184 19.822 37.834 3.055
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4|| METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 80.679428863 100.00000000 0.000000000 PC1 100.0000000
## 2 38.883792330 80.69142135 0.000000000 PC2 80.6914213
## 3 56.198105473 9.09453097 33.494895269 cg00962106 56.1981055
## 4 7.501419843 0.00000000 55.870041073 PC3 55.8700411
## 5 26.314966715 48.63888581 6.535588437 cg19503462 48.6388858
## 6 47.903297796 21.17904018 8.087900003 cg27452255 47.9032978
## 7 27.968017421 45.98633975 1.294250776 cg07152869 45.9863398
## 8 3.336642534 45.58715847 28.317868132 cg05096415 45.5871585
## 9 18.272375469 12.77464831 45.585242833 cg02225060 45.5852428
## 10 45.324125992 8.65147591 21.700847291 cg14710850 45.3241260
## 11 23.097249751 5.92030775 45.302133511 cg02981548 45.3021335
## 12 44.863400323 0.00000000 36.602318185 cg08861434 44.8634003
## 13 14.450136041 42.01469779 10.561940960 cg03129555 42.0146978
## 14 41.988198265 6.87510878 20.296659603 cg23432430 41.9881983
## 15 8.925155758 17.01038504 41.736552660 cg16749614 41.7365527
## 16 3.590290791 40.13003730 25.168269158 cg17186592 40.1300373
## 17 1.854949730 38.98245410 23.221432408 cg14924512 38.9824541
## 18 38.239988922 7.57626268 15.081591010 cg09584650 38.2399889
## 19 13.558155574 38.08328750 11.893308719 cg06864789 38.0832875
## 20 19.822087774 37.83431155 3.055188849 cg03084184 37.8343116
## 21 21.496100293 0.52009240 37.528929069 cg11133939 37.5289291
## 22 13.593984894 37.18964818 9.117936907 cg00247094 37.1896482
## 23 0.535099440 20.67938057 35.715187958 cg08857872 35.7151880
## 24 35.480845254 7.95874622 14.041837327 cg16715186 35.4808453
## 25 4.934937153 35.04718906 17.442152969 cg24859648 35.0471891
## 26 14.089726349 34.55372136 5.433977699 cg12279734 34.5537214
## 27 1.727162103 34.09962713 18.442895227 cg25259265 34.0996271
## 28 8.425461156 34.05545001 11.634704145 cg06378561 34.0554500
## 29 2.317735890 13.36092372 31.981929900 cg26219488 31.9819299
## 30 12.464527158 31.58726203 5.781375790 cg20913114 31.5872620
## 31 5.482910493 11.24716578 31.373100048 cg16652920 31.3731000
## 32 1.406006839 30.96767657 17.376552573 cg05841700 30.9676766
## 33 29.671776171 14.07606681 0.800915538 cg26948066 29.6717762
## 34 28.733576534 12.27805126 0.034366374 cg03982462 28.7335765
## 35 28.251208564 8.08782117 6.647490164 cg11227702 28.2512086
## 36 6.457033630 28.04864581 8.136034298 cg09854620 28.0486458
## 37 27.485718562 0.00000000 21.536098995 cg06536614 27.4857186
## 38 7.543830482 9.69486687 27.094786126 cg02621446 27.0947861
## 39 0.000000000 26.98547093 24.131491853 cg02494911 26.9854709
## 40 20.458258549 0.00000000 26.629992272 cg12146221 26.6299923
## 41 0.000000000 25.78694507 26.611100611 cg00616572 26.6111006
## 42 9.536723526 26.42939213 5.651344888 cg10750306 26.4293921
## 43 26.168568224 7.88086972 6.037506608 cg15535896 26.1685682
## 44 1.141041466 25.93481794 13.648772957 cg01667144 25.9348179
## 45 0.000000000 25.62797982 13.464484954 cg24861747 25.6279798
## 46 25.545489329 15.10535058 0.000000000 cg10240127 25.5454893
## 47 24.102526327 0.00000000 25.136745728 cg02372404 25.1367457
## 48 1.110369810 8.19438850 25.058064292 cg06715136 25.0580643
## 49 24.827775106 0.00000000 16.155963858 cg20685672 24.8277751
## 50 0.000000000 24.76952655 14.635617490 cg05570109 24.7695265
## 51 24.719596431 0.00000000 13.464185675 cg04248279 24.7195964
## 52 4.027296440 5.52070774 24.339307078 cg20678988 24.3393071
## 53 0.000000000 24.19961748 18.409106406 cg12534577 24.1996175
## 54 0.000000000 24.15328295 15.846410021 cg16579946 24.1532829
## 55 4.826664207 24.12335487 5.714614685 cg12738248 24.1233549
## 56 6.533547554 5.92725750 24.066141752 cg16771215 24.0661418
## 57 23.998797144 10.16532801 0.028545462 cg13080267 23.9987971
## 58 5.506607573 5.67362617 23.067400417 cg17738613 23.0674004
## 59 22.320443120 6.53640040 5.659119155 cg11331837 22.3204431
## 60 0.000000000 22.28339674 17.216096991 cg01680303 22.2833967
## 61 22.210310684 0.00000000 13.206930759 cg04412904 22.2103107
## 62 0.000000000 22.06548677 14.956522108 cg18821122 22.0654868
## 63 3.420024153 7.31988987 22.051844895 cg12682323 22.0518449
## 64 22.042423499 16.24236556 0.000000000 cg02356645 22.0424235
## 65 0.000000000 20.83329334 22.027588678 cg24873924 22.0275887
## 66 0.000000000 15.83377891 21.998528622 cg10369879 21.9985286
## 67 6.480617738 21.72590957 0.939133148 cg01013522 21.7259096
## 68 16.495721453 0.00000000 21.583480114 cg12228670 21.5834801
## 69 7.519511314 21.11628056 0.000000000 cg07523188 21.1162806
## 70 21.103724796 18.09207717 0.000000000 cg15775217 21.1037248
## 71 20.985180608 0.00000000 16.898707961 cg03071582 20.9851806
## 72 20.943857120 0.00000000 12.124679760 cg05234269 20.9438571
## 73 0.000000000 20.89510385 7.918854390 cg20507276 20.8951039
## 74 0.000000000 19.10810611 20.829416683 cg27341708 20.8294167
## 75 13.165537730 20.45343113 0.000000000 cg25561557 20.4534311
## 76 20.436701259 8.86519380 0.349448845 cg03088219 20.4367013
## 77 20.387453896 0.00000000 19.555842902 cg01921484 20.3874539
## 78 4.715766302 20.18828004 4.199165451 cg26069044 20.1882800
## 79 20.128075536 0.00000000 7.556981121 cg06112204 20.1280755
## 80 20.076550652 0.00000000 10.293594056 cg25758034 20.0765507
## 81 20.065778545 0.22748177 9.400207555 cg17421046 20.0657785
## 82 19.735141486 0.00000000 9.881499267 cg17429539 19.7351415
## 83 19.731674390 0.00000000 12.798875580 cg11438323 19.7316744
## 84 19.532262668 14.86086455 0.000000000 cg00322003 19.5322627
## 85 19.322487367 4.15331038 4.746838336 cg11187460 19.3224874
## 86 2.510648117 5.41891238 18.970344056 cg25879395 18.9703441
## 87 4.055575121 18.84551312 0.228938987 cg26474732 18.8455131
## 88 2.893715763 18.78832756 2.430352370 cg23161429 18.7883276
## 89 1.682911789 4.78952445 18.695452533 cg20370184 18.6954525
## 90 18.641868330 0.02258014 6.337732939 cg25436480 18.6418683
## 91 0.009327755 7.64286248 18.625807773 cg13885788 18.6258078
## 92 11.435914221 18.25338438 0.000000000 cg23916408 18.2533844
## 93 0.000000000 16.67258119 18.154192689 cg14527649 18.1541927
## 94 5.003807356 1.01079493 18.046203985 cg10738648 18.0462040
## 95 0.000000000 17.96700771 12.794428811 cg23658987 17.9670077
## 96 5.991551785 17.95117220 1.290455798 cg18339359 17.9511722
## 97 10.240858920 0.00000000 17.847594160 cg07480176 17.8475942
## 98 16.799265388 17.79043659 0.000000000 cg26757229 17.7904366
## 99 2.974513464 17.78047141 4.060710433 cg12284872 17.7804714
## 100 8.047755488 17.46985680 0.000000000 cg24506579 17.4698568
## 101 17.448328611 8.51043165 0.000000000 cg02932958 17.4483286
## 102 13.323326679 0.00000000 17.355723517 cg00272795 17.3557235
## 103 0.000000000 7.44854412 17.195823649 cg12784167 17.1958236
## 104 16.752406581 0.00000000 6.655601915 cg03660162 16.7524066
## 105 0.000000000 16.01388121 16.463252853 cg16178271 16.4632529
## 106 16.360904095 0.00000000 11.982490178 cg27577781 16.3609041
## 107 16.148865611 0.00000000 8.265317801 cg07138269 16.1488656
## 108 15.970109102 2.88564640 2.056078529 cg05321907 15.9701091
## 109 0.758550200 15.68809089 2.141928397 cg22274273 15.6880909
## 110 0.469063822 3.15344245 15.547390512 cg15865722 15.5473905
## 111 13.420379863 15.52876319 0.000000000 cg21209485 15.5287632
## 112 15.462630452 0.63364112 3.697728095 cg20139683 15.4626305
## 113 0.805213233 15.27248433 2.251238780 cg15633912 15.2724843
## 114 1.781942955 15.20891238 0.499178354 cg00675157 15.2089124
## 115 0.000000000 15.01349056 13.725064390 cg21854924 15.0134906
## 116 0.000000000 8.30445801 14.977088715 cg14564293 14.9770887
## 117 1.414740021 14.67400925 1.624599344 cg01933473 14.6740093
## 118 14.358410506 0.00000000 2.353605554 cg06950937 14.3584105
## 119 7.036160578 0.00000000 14.260960723 cg14293999 14.2609607
## 120 0.000000000 7.62763969 14.099430071 cg01128042 14.0994301
## 121 13.967034375 0.00000000 2.023110140 cg12776173 13.9670344
## 122 13.960184792 0.00000000 13.905326596 cg03327352 13.9601848
## 123 8.333769356 0.00000000 13.928550552 cg24851651 13.9285506
## 124 13.708623191 0.00000000 7.305663059 cg00696044 13.7086232
## 125 8.532678036 0.00000000 13.700028272 cg19377607 13.7000283
## 126 0.000000000 2.79030342 13.616930212 cg01153376 13.6169302
## 127 13.578106407 3.86934109 0.000000000 cg19512141 13.5781064
## 128 0.000000000 6.31061501 13.528579969 cg18819889 13.5285800
## 129 8.860694660 0.00000000 13.136946476 cg27272246 13.1369465
## 130 12.221081080 0.00000000 12.990255691 cg08198851 12.9902557
## 131 0.000000000 9.82358796 12.685100159 cg06118351 12.6851002
## 132 4.058933776 12.40341568 0.000000000 cg10985055 12.4034157
## 133 0.930376494 11.76495324 0.005222529 cg16788319 11.7649532
## 134 1.061079854 11.75211794 0.000000000 cg14240646 11.7521179
## 135 0.794740956 11.57126294 0.396509276 cg00999469 11.5712629
## 136 0.000000000 11.34774657 10.931613703 cg12012426 11.3477466
## 137 0.000000000 2.68390094 10.896653093 cg01549082 10.8966531
## 138 10.744293417 0.00000000 9.157358594 cg21697769 10.7442934
## 139 10.665484601 0.00000000 7.591554795 cg07028768 10.6654846
## 140 10.323399632 3.96769044 0.000000000 cg17906851 10.3233996
## 141 0.000000000 8.37319093 9.807447434 cg27086157 9.8074474
## 142 9.745983813 9.21937779 0.000000000 cg08584917 9.7459838
## 143 0.310904240 9.73868693 0.000000000 cg06697310 9.7386869
## 144 0.601000492 9.52007028 0.000000000 cg02320265 9.5200703
## 145 2.508816376 0.00000000 9.494379581 cg04664583 9.4943796
## 146 4.880776384 0.00000000 8.718261699 cg14307563 8.7182617
## 147 6.238606499 0.00000000 8.443705870 cg08779649 8.4437059
## 148 0.000000000 6.06930373 7.360511721 cg00154902 7.3605117
## 149 0.000000000 0.00000000 6.388083312 cg12466610 6.3880833
## 150 6.342855658 4.10143474 0.000000000 cg27639199 6.3428557
## 151 0.000000000 5.86185727 4.806007444 cg00689685 5.8618573
## 152 0.000000000 2.96250768 5.177422179 cg15501526 5.1774222
## 153 2.837884424 0.00000000 0.000000000 cg01413796 2.8378844
## 154 0.421228002 0.00000000 0.566051449 age.now 0.5660514
## 155 0.000000000 0.43807821 0.040984280 cg11247378 0.4380782
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 80.679429 100.000000 0.000000 PC1 100.00000
## 2 38.883792 80.691421 0.000000 PC2 80.69142
## 3 56.198105 9.094531 33.494895 cg00962106 56.19811
## 4 7.501420 0.000000 55.870041 PC3 55.87004
## 5 26.314967 48.638886 6.535588 cg19503462 48.63889
## 6 47.903298 21.179040 8.087900 cg27452255 47.90330
## 7 27.968017 45.986340 1.294251 cg07152869 45.98634
## 8 3.336643 45.587158 28.317868 cg05096415 45.58716
## 9 18.272375 12.774648 45.585243 cg02225060 45.58524
## 10 45.324126 8.651476 21.700847 cg14710850 45.32413
## 11 23.097250 5.920308 45.302134 cg02981548 45.30213
## 12 44.863400 0.000000 36.602318 cg08861434 44.86340
## 13 14.450136 42.014698 10.561941 cg03129555 42.01470
## 14 41.988198 6.875109 20.296660 cg23432430 41.98820
## 15 8.925156 17.010385 41.736553 cg16749614 41.73655
## 16 3.590291 40.130037 25.168269 cg17186592 40.13004
## 17 1.854950 38.982454 23.221432 cg14924512 38.98245
## 18 38.239989 7.576263 15.081591 cg09584650 38.23999
## 19 13.558156 38.083288 11.893309 cg06864789 38.08329
## 20 19.822088 37.834312 3.055189 cg03084184 37.83431
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "PC3" "cg19503462" "cg27452255" "cg07152869"
## [8] "cg05096415" "cg02225060" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.6571736 0.42345797
## 0 0.05357895 0.6725349 0.43439423
## 0 0.10615789 0.6747338 0.43094148
## 0 0.15873684 0.6725599 0.42391171
## 0 0.21131579 0.6725837 0.41818370
## 0 0.26389474 0.6770526 0.42406079
## 0 0.31647368 0.6769804 0.41856449
## 0 0.36905263 0.6726087 0.40853473
## 0 0.42163158 0.6638170 0.38542265
## 0 0.47421053 0.6660148 0.38902178
## 0 0.52678947 0.6594214 0.37628816
## 0 0.57936842 0.6550252 0.36510400
## 0 0.63194737 0.6528274 0.35927177
## 0 0.68452632 0.6418618 0.33471759
## 0 0.73710526 0.6352200 0.31832804
## 0 0.78968421 0.6307756 0.30720022
## 0 0.84226316 0.6263800 0.29777058
## 0 0.89484211 0.6220322 0.28739881
## 0 0.94742105 0.6220322 0.28739881
## 0 1.00000000 0.6220322 0.28682520
## 1 0.00100000 0.6240596 0.37352512
## 1 0.05357895 0.5187546 0.05457313
## 1 0.10615789 0.5142862 0.00000000
## 1 0.15873684 0.5142862 0.00000000
## 1 0.21131579 0.5142862 0.00000000
## 1 0.26389474 0.5142862 0.00000000
## 1 0.31647368 0.5142862 0.00000000
## 1 0.36905263 0.5142862 0.00000000
## 1 0.42163158 0.5142862 0.00000000
## 1 0.47421053 0.5142862 0.00000000
## 1 0.52678947 0.5142862 0.00000000
## 1 0.57936842 0.5142862 0.00000000
## 1 0.63194737 0.5142862 0.00000000
## 1 0.68452632 0.5142862 0.00000000
## 1 0.73710526 0.5142862 0.00000000
## 1 0.78968421 0.5142862 0.00000000
## 1 0.84226316 0.5142862 0.00000000
## 1 0.89484211 0.5142862 0.00000000
## 1 0.94742105 0.5142862 0.00000000
## 1 1.00000000 0.5142862 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.5868408
FeatEval_Median_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Median_mean_accuracy_cv_ENM1)
## [1] 0.5868408
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Median_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.863736263736264"
print(FeatEval_Median_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Median_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Median_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 5 13
## Dementia 0 8 0
## MCI 21 15 86
##
## Overall Statistics
##
## Accuracy : 0.7202
## 95% CI : (0.6512, 0.7823)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 3.473e-09
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 6.901e-05
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.28571 0.8687
## Specificity 0.8583 1.00000 0.6170
## Pos Pred Value 0.7143 1.00000 0.7049
## Neg Pred Value 0.8385 0.89189 0.8169
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.04145 0.4456
## Detection Prevalence 0.3264 0.04145 0.6321
## Balanced Accuracy 0.7700 0.64286 0.7429
cm_FeatEval_Median_ENM1_Accuracy<-cm_FeatEval_Median_ENM1$overall["Accuracy"]
cm_FeatEval_Median_ENM1_Kappa<-cm_FeatEval_Median_ENM1$overall["Kappa"]
print(cm_FeatEval_Median_ENM1_Accuracy)
## Accuracy
## 0.7202073
print(cm_FeatEval_Median_ENM1_Kappa)
## Kappa
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 86.62 100.000 13.317
## PC2 68.41 88.617 20.144
## cg00962106 72.96 12.359 60.542
## cg02225060 43.13 18.831 62.028
## cg02981548 49.96 8.975 59.003
## cg23432430 57.29 15.755 41.467
## cg14710850 54.50 8.365 46.074
## cg16749614 20.68 33.681 54.423
## cg07152869 48.29 54.287 5.935
## cg08857872 29.00 24.415 53.478
## cg16652920 27.03 25.381 52.479
## cg26948066 51.16 42.093 9.005
## PC3 12.10 38.688 50.850
## cg08861434 48.60 1.034 49.702
## cg27452255 49.50 29.759 19.674
## cg09584650 48.11 20.547 27.502
## cg11133939 31.91 15.802 47.780
## cg19503462 47.24 44.920 2.255
## cg06864789 20.57 46.482 25.848
## cg02372404 30.74 14.685 45.487
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 86.61856914 1.000000e+02 13.3172756 PC1 100.0000000
## 2 68.40923356 8.861718e+01 20.1437865 PC2 88.6171753
## 3 72.96469699 1.235880e+01 60.5417410 cg00962106 72.9646970
## 4 43.13240373 1.883119e+01 62.0277500 cg02225060 62.0277500
## 5 49.96445547 8.974834e+00 59.0034449 cg02981548 59.0034449
## 6 57.28594595 1.575519e+01 41.4665989 cg23432430 57.2859460
## 7 54.50276255 8.364596e+00 46.0740113 cg14710850 54.5027626
## 8 20.67722621 3.368145e+01 54.4228308 cg16749614 54.4228308
## 9 48.28784196 5.428685e+01 5.9348530 cg07152869 54.2868503
## 10 28.99955065 2.441458e+01 53.4782904 cg08857872 53.4782904
## 11 27.03411042 2.538099e+01 52.4792605 cg16652920 52.4792605
## 12 51.16151095 4.209271e+01 9.0046416 cg26948066 51.1615110
## 13 12.09810917 3.868799e+01 50.8502580 PC3 50.8502580
## 14 48.60398248 1.033556e+00 49.7016936 cg08861434 49.7016936
## 15 49.49760764 2.975942e+01 19.6740347 cg27452255 49.4976076
## 16 48.11260158 2.054659e+01 27.5018605 cg09584650 48.1126016
## 17 31.91327401 1.580218e+01 47.7796126 cg11133939 47.7796126
## 18 47.23884683 4.492017e+01 2.2545244 cg19503462 47.2388468
## 19 20.56964371 4.648222e+01 25.8484212 cg06864789 46.4822202
## 20 30.73782371 1.468467e+01 45.4866533 cg02372404 45.4866533
## 21 13.69354785 4.531509e+01 31.5573838 cg24859648 45.3150869
## 22 10.38234270 3.471979e+01 45.1662858 cg14527649 45.1662858
## 23 44.71198271 3.266239e+01 11.9854379 cg03982462 44.7119827
## 24 43.77730816 1.498709e+01 28.7260663 cg06536614 43.7773082
## 25 0.05883057 4.329808e+01 43.1750913 cg17186592 43.2980771
## 26 26.35356177 1.675128e+01 43.1689936 cg26219488 43.1689936
## 27 42.96184106 1.407974e+01 28.8179482 cg10240127 42.9618411
## 28 13.43728479 4.289995e+01 29.3985098 cg00247094 42.8999499
## 29 35.47454508 6.858665e+00 42.3973656 cg20685672 42.3973656
## 30 3.59352609 4.215343e+01 38.4957495 cg25259265 42.1534309
## 31 42.14101201 1.425672e+01 27.8201416 cg16715186 42.1410120
## 32 0.72471522 4.194004e+01 41.1511653 cg05096415 41.9400358
## 33 34.83387329 4.176063e+01 6.8626049 cg15775217 41.7606334
## 34 15.96588577 4.058679e+01 24.5567515 cg24861747 40.5867925
## 35 34.02582240 6.216602e+00 40.3065798 cg07028768 40.3065798
## 36 4.42923611 3.973188e+01 35.2384836 cg14924512 39.7318750
## 37 24.97857945 3.964211e+01 14.5993802 cg03084184 39.6421149
## 38 4.47364133 3.907237e+01 34.5345690 cg05570109 39.0723656
## 39 34.87838267 4.000343e+00 38.9428813 cg01921484 38.9428813
## 40 9.76109106 2.779419e+01 37.6194357 cg00154902 37.6194357
## 41 28.32371241 3.743846e+01 9.0505944 cg26757229 37.4384621
## 42 37.35409876 9.845548e+00 27.4443952 cg03660162 37.3540988
## 43 35.87862161 5.228450e-01 36.4656219 cg12228670 36.4656219
## 44 4.41997575 3.173978e+01 36.2239098 cg00616572 36.2239098
## 45 14.11765143 3.616344e+01 21.9816350 cg20507276 36.1634417
## 46 5.45749304 3.544622e+01 29.9245672 cg05841700 35.4462155
## 47 21.86529704 1.351351e+01 35.4429584 cg06715136 35.4429584
## 48 22.83396655 1.227262e+01 35.1707449 cg02621446 35.1707449
## 49 18.36248346 3.501828e+01 16.5916380 cg12738248 35.0182767
## 50 14.22710960 3.493641e+01 20.6451501 cg09854620 34.9364150
## 51 32.22376446 3.481855e+01 2.5306279 cg00322003 34.8185476
## 52 8.08384230 2.660522e+01 34.7532141 cg24873924 34.7532141
## 53 14.18047579 3.469812e+01 20.4534883 cg03129555 34.6981194
## 54 34.67696776 7.589721e+00 27.0230912 cg04412904 34.6769678
## 55 15.01146046 1.956788e+01 34.6434941 cg17738613 34.6434941
## 56 18.92284513 1.558633e+01 34.5733334 cg25879395 34.5733334
## 57 34.33931078 1.088425e+01 23.3909061 cg05234269 34.3393108
## 58 22.74938839 3.407273e+01 11.2591886 cg20913114 34.0727323
## 59 1.10552969 3.256819e+01 33.7378725 cg02494911 33.7378725
## 60 17.46538209 3.350897e+01 15.9794332 cg00675157 33.5089705
## 61 26.90358032 3.346119e+01 6.4934510 cg12279734 33.4611866
## 62 12.80983950 2.054797e+01 33.4219666 cg01153376 33.4219666
## 63 30.28967663 2.971189e+00 33.3250209 cg04248279 33.3250209
## 64 30.64200051 3.320655e+01 2.5003923 cg06697310 33.2065481
## 65 19.19843226 1.362877e+01 32.8913537 cg16771215 32.8913537
## 66 25.57003042 3.288938e+01 7.2551937 cg26474732 32.8893794
## 67 1.21314270 3.269567e+01 31.4183733 cg12534577 32.6956712
## 68 14.55313738 3.243791e+01 17.8206148 cg06378561 32.4379075
## 69 19.18973566 1.316038e+01 32.4142742 cg18819889 32.4142742
## 70 29.77270902 3.221745e+01 2.3805896 cg01013522 32.2174539
## 71 8.93772126 2.320996e+01 32.2118330 cg10369879 32.2118330
## 72 31.33577329 9.313653e+00 21.9579652 cg03327352 31.3357733
## 73 31.29967812 8.695863e+00 22.5396602 cg07138269 31.2996781
## 74 30.28028219 7.130515e-01 31.0574889 cg12146221 31.0574889
## 75 31.01600323 1.154261e+01 19.4092367 cg11227702 31.0160032
## 76 30.51131704 2.020539e-01 30.7775262 cg27577781 30.7775262
## 77 30.73303248 2.929545e+01 1.3734260 cg02356645 30.7330325
## 78 10.88695480 1.960658e+01 30.5576906 cg15865722 30.5576906
## 79 21.12814755 3.052680e+01 9.3344960 cg18339359 30.5267988
## 80 21.72224653 3.049841e+01 8.7120057 cg08584917 30.4984075
## 81 30.47938340 1.623212e+01 14.1831103 cg15535896 30.4793834
## 82 9.34688271 3.034956e+01 20.9385240 cg01680303 30.3495620
## 83 0.66029731 2.956642e+01 30.2908744 cg01667144 30.2908744
## 84 17.55953473 2.993701e+01 12.3133187 cg07523188 29.9370087
## 85 12.71944904 1.708317e+01 29.8667786 cg21854924 29.8667786
## 86 9.98858571 2.974028e+01 19.6875423 cg10750306 29.7402832
## 87 5.72424689 2.961588e+01 23.8274785 cg16579946 29.6158807
## 88 29.45167177 5.867809e+00 23.5197079 cg11438323 29.4516718
## 89 7.89481912 2.936063e+01 21.4016585 cg18821122 29.3606329
## 90 13.47025445 1.551555e+01 29.0499630 cg01128042 29.0499630
## 91 12.43865614 1.650719e+01 29.0100007 cg14564293 29.0100007
## 92 28.70024447 4.432688e-01 28.1928204 cg08198851 28.7002445
## 93 25.92001930 2.699439e+00 28.6836137 cg00696044 28.6836137
## 94 28.64274404 7.484468e+00 21.0941208 cg17421046 28.6427440
## 95 28.22281533 1.423058e+01 13.9280826 cg11331837 28.2228153
## 96 4.57949131 2.318121e+01 27.8248553 cg12682323 27.8248553
## 97 27.75178280 2.314445e+01 4.5431752 cg02932958 27.7517828
## 98 2.23018238 2.770392e+01 25.4095774 cg23658987 27.7039151
## 99 13.54125012 1.405997e+01 27.6653736 cg07480176 27.6653736
## 100 18.99135526 8.561151e+00 27.6166619 cg10738648 27.6166619
## 101 23.24171267 4.223920e+00 27.5297883 cg03071582 27.5297883
## 102 27.50633915 1.371731e+01 13.7248754 cg25758034 27.5063392
## 103 8.31603304 1.850390e+01 26.8840917 cg06118351 26.8840917
## 104 26.47257188 2.668221e+01 0.1454858 cg19512141 26.6822130
## 105 15.77003229 2.662199e+01 10.7878011 cg23161429 26.6219887
## 106 13.97970860 2.639323e+01 12.3493705 cg11247378 26.3932344
## 107 18.58873769 7.684422e+00 26.3373146 cg20678988 26.3373146
## 108 14.36630731 1.154461e+01 25.9750683 cg27086157 25.9750683
## 109 25.84323391 9.775700e+00 16.0033784 cg03088219 25.8432339
## 110 13.62723686 2.527421e+01 11.5828134 cg22274273 25.2742056
## 111 2.73030181 2.236070e+01 25.1551567 cg13885788 25.1551567
## 112 7.96947771 1.668187e+01 24.7155013 cg14240646 24.7155013
## 113 23.64664608 7.870673e-01 24.4978687 cg06112204 24.4978687
## 114 24.37581284 4.909569e+00 19.4020885 cg17429539 24.3758128
## 115 23.05210026 2.435068e+01 1.2344261 cg25561557 24.3506817
## 116 21.11642846 3.134820e+00 24.3154035 cg14293999 24.3154035
## 117 15.52350094 8.639394e+00 24.2270504 cg19377607 24.2270504
## 118 21.13941824 2.411177e+01 2.9081939 cg06950937 24.1117674
## 119 24.09416705 4.091480e+00 19.9385319 cg25436480 24.0941671
## 120 14.61625621 9.014240e+00 23.6946512 cg00272795 23.6946512
## 121 10.00492471 1.338641e+01 23.4554942 cg12012426 23.4554942
## 122 23.38047442 1.718219e+01 6.1341289 cg05321907 23.3804744
## 123 23.15224963 9.972486e+00 13.1156084 cg20139683 23.1522496
## 124 0.72026425 2.312477e+01 22.3403456 cg26069044 23.1247652
## 125 21.02229411 2.241430e+01 1.3278555 cg23916408 22.4143048
## 126 0.60344482 2.222826e+01 21.5606612 cg27341708 22.2282613
## 127 15.96951803 2.220693e+01 6.1732548 cg13080267 22.2069281
## 128 21.85904275 1.296775e+00 20.4981121 cg27272246 21.8590428
## 129 0.95551426 2.184089e+01 20.8212223 cg12284872 21.8408918
## 130 2.40801550 2.169902e+01 19.2268469 cg00689685 21.6990177
## 131 2.00882816 2.152514e+01 19.4521562 cg16178271 21.5251397
## 132 21.27669027 8.123896e+00 13.0886385 cg21209485 21.2766903
## 133 20.58800214 1.058955e+01 9.9342921 cg24851651 20.5880021
## 134 20.33445521 7.327227e+00 12.9430731 cg21697769 20.3344552
## 135 20.32764749 6.212758e+00 14.0507346 cg04664583 20.3276475
## 136 14.63879152 1.993277e+01 5.2298202 cg00999469 19.9327670
## 137 2.26826077 1.742733e+01 19.7597458 cg20370184 19.7597458
## 138 18.98018869 4.183448e+00 14.7325852 cg11187460 18.9801887
## 139 18.43492519 1.997626e+00 16.3731437 cg12784167 18.4349252
## 140 1.20049655 1.698226e+01 18.2469088 cg02320265 18.2469088
## 141 17.49071207 1.357646e+01 3.8500929 cg12776173 17.4907121
## 142 17.27595583 1.270576e+00 15.9412248 cg08779649 17.2759558
## 143 8.18137127 8.987929e+00 17.2334557 cg01933473 17.2334557
## 144 17.18292416 8.949228e+00 8.1695414 cg15501526 17.1829242
## 145 13.77316373 1.693296e+01 3.0956441 cg10985055 16.9329631
## 146 16.16347361 6.749371e+00 9.3499476 cg17906851 16.1634736
## 147 11.29815350 4.706969e+00 16.0692781 cg14307563 16.0692781
## 148 4.33181096 1.431018e+01 9.9142134 cg16788319 14.3101796
## 149 11.34767824 1.384098e+01 2.4291435 cg24506579 13.8409770
## 150 9.52173130 1.242049e+01 2.8346083 cg27639199 12.4204949
## 151 1.91285375 1.029383e+01 12.2708378 cg12466610 12.2708378
## 152 9.00293247 2.188499e+00 11.2555867 cg15633912 11.2555867
## 153 0.00000000 1.116694e+01 11.2310930 cg01413796 11.2310930
## 154 1.45721713 1.876081e-01 1.7089805 cg01549082 1.7089805
## 155 0.70664419 5.164989e-03 0.7759644 age.now 0.7759644
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 86.61857 100.000000 13.317276 PC1 100.00000
## 2 68.40923 88.617175 20.143786 PC2 88.61718
## 3 72.96470 12.358801 60.541741 cg00962106 72.96470
## 4 43.13240 18.831191 62.027750 cg02225060 62.02775
## 5 49.96446 8.974834 59.003445 cg02981548 59.00344
## 6 57.28595 15.755192 41.466599 cg23432430 57.28595
## 7 54.50276 8.364596 46.074011 cg14710850 54.50276
## 8 20.67723 33.681449 54.422831 cg16749614 54.42283
## 9 48.28784 54.286850 5.934853 cg07152869 54.28685
## 10 28.99955 24.414585 53.478290 cg08857872 53.47829
## 11 27.03411 25.380995 52.479260 cg16652920 52.47926
## 12 51.16151 42.092714 9.004642 cg26948066 51.16151
## 13 12.09811 38.687994 50.850258 PC3 50.85026
## 14 48.60398 1.033556 49.701694 cg08861434 49.70169
## 15 49.49761 29.759418 19.674035 cg27452255 49.49761
## 16 48.11260 20.546586 27.501861 cg09584650 48.11260
## 17 31.91327 15.802183 47.779613 cg11133939 47.77961
## 18 47.23885 44.920167 2.254524 cg19503462 47.23885
## 19 20.56964 46.482220 25.848421 cg06864789 46.48222
## 20 30.73782 14.684674 45.486653 cg02372404 45.48665
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
## [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3" "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_ENM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272
print(FeatEval_Median_ENM1_AUC)
## [1] 0.8566272
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.5736062 0.2133427
## 0.3 1 0.6 0.50 100 0.5582439 0.2085935
## 0.3 1 0.6 0.50 150 0.5625917 0.2244108
## 0.3 1 0.6 0.75 50 0.5407347 0.1472015
## 0.3 1 0.6 0.75 100 0.5540176 0.1792182
## 0.3 1 0.6 0.75 150 0.5628333 0.2063006
## 0.3 1 0.6 1.00 50 0.5231746 0.1106318
## 0.3 1 0.6 1.00 100 0.5474232 0.1678977
## 0.3 1 0.6 1.00 150 0.5606339 0.2029962
## 0.3 1 0.8 0.50 50 0.5890880 0.2424088
## 0.3 1 0.8 0.50 100 0.5934124 0.2672959
## 0.3 1 0.8 0.50 150 0.5868408 0.2642353
## 0.3 1 0.8 0.75 50 0.5430514 0.1555119
## 0.3 1 0.8 0.75 100 0.5406843 0.1665257
## 0.3 1 0.8 0.75 150 0.5605617 0.2088901
## 0.3 1 0.8 1.00 50 0.5342358 0.1288615
## 0.3 1 0.8 1.00 100 0.5407804 0.1556951
## 0.3 1 0.8 1.00 150 0.5584116 0.2014657
## 0.3 2 0.6 0.50 50 0.5538228 0.1995945
## 0.3 2 0.6 0.50 100 0.5692796 0.2243763
## 0.3 2 0.6 0.50 150 0.5736752 0.2365566
## 0.3 2 0.6 0.75 50 0.5494283 0.1713509
## 0.3 2 0.6 0.75 100 0.5736768 0.2237137
## 0.3 2 0.6 0.75 150 0.5604895 0.2040200
## 0.3 2 0.6 1.00 50 0.5385587 0.1501599
## 0.3 2 0.6 1.00 100 0.5516255 0.1817362
## 0.3 2 0.6 1.00 150 0.5737007 0.2208991
## 0.3 2 0.8 0.50 50 0.5625195 0.2024290
## 0.3 2 0.8 0.50 100 0.5779763 0.2363713
## 0.3 2 0.8 0.50 150 0.5824197 0.2476196
## 0.3 2 0.8 0.75 50 0.5867920 0.2509766
## 0.3 2 0.8 0.75 100 0.5934342 0.2573927
## 0.3 2 0.8 0.75 150 0.5846902 0.2431550
## 0.3 2 0.8 1.00 50 0.5451999 0.1662079
## 0.3 2 0.8 1.00 100 0.5627834 0.1963118
## 0.3 2 0.8 1.00 150 0.5539683 0.1831114
## 0.3 3 0.6 0.50 50 0.5649350 0.1945959
## 0.3 3 0.6 0.50 100 0.5713351 0.2177009
## 0.3 3 0.6 0.50 150 0.5846435 0.2435426
## 0.3 3 0.6 0.75 50 0.5582439 0.1912471
## 0.3 3 0.6 0.75 100 0.5693056 0.2148966
## 0.3 3 0.6 0.75 150 0.5627849 0.2058365
## 0.3 3 0.6 1.00 50 0.5538971 0.1741107
## 0.3 3 0.6 1.00 100 0.5626889 0.1948407
## 0.3 3 0.6 1.00 150 0.5583416 0.1924154
## 0.3 3 0.8 0.50 50 0.5781712 0.2266612
## 0.3 3 0.8 0.50 100 0.5627117 0.2093377
## 0.3 3 0.8 0.50 150 0.5671322 0.2231162
## 0.3 3 0.8 0.75 50 0.5648134 0.1892441
## 0.3 3 0.8 0.75 100 0.5889659 0.2431679
## 0.3 3 0.8 0.75 150 0.5802469 0.2286026
## 0.3 3 0.8 1.00 50 0.5671567 0.2005874
## 0.3 3 0.8 1.00 100 0.5671567 0.2028428
## 0.3 3 0.8 1.00 150 0.5781462 0.2285191
## 0.4 1 0.6 0.50 50 0.5428810 0.1759217
## 0.4 1 0.6 0.50 100 0.5472050 0.1993525
## 0.4 1 0.6 0.50 150 0.5560694 0.2141393
## 0.4 1 0.6 0.75 50 0.5342602 0.1478947
## 0.4 1 0.6 0.75 100 0.5870813 0.2513222
## 0.4 1 0.6 0.75 150 0.5828056 0.2515964
## 0.4 1 0.6 1.00 50 0.5386797 0.1427571
## 0.4 1 0.6 1.00 100 0.5605850 0.2101882
## 0.4 1 0.6 1.00 150 0.5583384 0.2008624
## 0.4 1 0.8 0.50 50 0.5561193 0.1990791
## 0.4 1 0.8 0.50 100 0.5493794 0.1975600
## 0.4 1 0.8 0.50 150 0.5537978 0.2081486
## 0.4 1 0.8 0.75 50 0.5473759 0.1714033
## 0.4 1 0.8 0.75 100 0.5518182 0.1871054
## 0.4 1 0.8 0.75 150 0.5759229 0.2413895
## 0.4 1 0.8 1.00 50 0.5539927 0.1727786
## 0.4 1 0.8 1.00 100 0.5650295 0.2081764
## 0.4 1 0.8 1.00 150 0.5627834 0.2115588
## 0.4 2 0.6 0.50 50 0.5736529 0.2303916
## 0.4 2 0.6 0.50 100 0.5670829 0.2220532
## 0.4 2 0.6 0.50 150 0.5758024 0.2386951
## 0.4 2 0.6 0.75 50 0.5584361 0.1926618
## 0.4 2 0.6 0.75 100 0.5761661 0.2324351
## 0.4 2 0.6 0.75 150 0.5716733 0.2229305
## 0.4 2 0.6 1.00 50 0.5516494 0.1728691
## 0.4 2 0.6 1.00 100 0.5736046 0.2204737
## 0.4 2 0.6 1.00 150 0.5802707 0.2332756
## 0.4 2 0.8 0.50 50 0.5562388 0.1972760
## 0.4 2 0.8 0.50 100 0.5539199 0.1963172
## 0.4 2 0.8 0.50 150 0.5583394 0.2088900
## 0.4 2 0.8 0.75 50 0.5452015 0.1706634
## 0.4 2 0.8 0.75 100 0.5649090 0.2141284
## 0.4 2 0.8 0.75 150 0.5672034 0.2176628
## 0.4 2 0.8 1.00 50 0.5671561 0.2076591
## 0.4 2 0.8 1.00 100 0.5824685 0.2350920
## 0.4 2 0.8 1.00 150 0.5845941 0.2462171
## 0.4 3 0.6 0.50 50 0.5823719 0.2396678
## 0.4 3 0.6 0.50 100 0.6021776 0.2854283
## 0.4 3 0.6 0.50 150 0.5735563 0.2354364
## 0.4 3 0.6 0.75 50 0.5848123 0.2435805
## 0.4 3 0.6 0.75 100 0.5760195 0.2307819
## 0.4 3 0.6 0.75 150 0.5780968 0.2322680
## 0.4 3 0.6 1.00 50 0.5868174 0.2402022
## 0.4 3 0.6 1.00 100 0.5890625 0.2540586
## 0.4 3 0.6 1.00 150 0.5869125 0.2498853
## 0.4 3 0.8 0.50 50 0.5627605 0.2116175
## 0.4 3 0.8 0.50 100 0.5891591 0.2611647
## 0.4 3 0.8 0.50 150 0.5869613 0.2567942
## 0.4 3 0.8 0.75 50 0.5868142 0.2422454
## 0.4 3 0.8 0.75 100 0.5889643 0.2511938
## 0.4 3 0.8 0.75 150 0.5889887 0.2505491
## 0.4 3 0.8 1.00 50 0.5650804 0.1936825
## 0.4 3 0.8 1.00 100 0.5627377 0.1949545
## 0.4 3 0.8 1.00 150 0.5715294 0.2145625
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
## 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.4, gamma =
## 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5662996
FeatEval_Median_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Median_mean_accuracy_cv_xgb)
## [1] 0.5662996
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Median_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Median_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Median_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Median_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 33 8 15
## Dementia 3 6 2
## MCI 30 14 82
##
## Overall Statistics
##
## Accuracy : 0.6269
## 95% CI : (0.5546, 0.6953)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 0.0009246
##
## Kappa : 0.331
##
## Mcnemar's Test P-Value : 0.0009969
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.5000 0.21429 0.8283
## Specificity 0.8189 0.96970 0.5319
## Pos Pred Value 0.5893 0.54545 0.6508
## Neg Pred Value 0.7591 0.87912 0.7463
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.1710 0.03109 0.4249
## Detection Prevalence 0.2902 0.05699 0.6528
## Balanced Accuracy 0.6594 0.59199 0.6801
cm_FeatEval_Median_xgb_Accuracy <-cm_FeatEval_Median_xgb$overall["Accuracy"]
cm_FeatEval_Median_xgb_Kappa <-cm_FeatEval_Median_xgb$overall["Kappa"]
print(cm_FeatEval_Median_xgb_Accuracy)
## Accuracy
## 0.626943
print(cm_FeatEval_Median_xgb_Kappa)
## Kappa
## 0.3309903
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 155)
##
## Overall
## age.now 100.00
## cg15501526 91.95
## cg16771215 86.45
## cg05234269 86.13
## cg25259265 75.79
## cg01921484 68.79
## cg03088219 68.21
## cg02981548 67.80
## cg00962106 66.03
## cg16652920 65.81
## cg01667144 63.79
## cg08857872 62.34
## cg07152869 61.76
## cg26948066 60.94
## cg01153376 59.72
## cg00154902 59.63
## cg10369879 59.32
## cg03084184 59.12
## cg18821122 56.96
## cg06864789 56.17
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: age.now 2.112743e-02 0.0185555881 0.0107066381 2.112743e-02
## 2: cg15501526 1.943013e-02 0.0172464912 0.0092790864 1.943013e-02
## 3: cg16771215 1.827025e-02 0.0148549078 0.0114204140 1.827025e-02
## 4: cg05234269 1.820234e-02 0.0133119992 0.0071377587 1.820234e-02
## 5: cg25259265 1.602252e-02 0.0112318961 0.0064239829 1.602252e-02
## ---
## 151: cg06112204 5.356297e-04 0.0008945313 0.0014275517 5.356297e-04
## 152: cg20370184 3.587610e-04 0.0006866937 0.0028551035 3.587610e-04
## 153: cg03071582 2.521750e-04 0.0009505033 0.0021413276 2.521750e-04
## 154: PC2 2.233942e-04 0.0005041107 0.0014275517 2.233942e-04
## 155: cg12466610 3.889293e-05 0.0001385521 0.0007137759 3.889293e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.7048
## The AUC value for class CN is: 0.7048437
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.69
## The AUC value for class Dementia is: 0.6900433
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6976
## The AUC value for class MCI is: 0.6976144
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_xgb_AUC <-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6975005
print(FeatEval_Median_xgb_AUC)
## [1] 0.6975005
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.5209529 0.02147335
## 78 0.5516499 0.12431050
## 155 0.5560227 0.12959840
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 155.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.5428752
FeatEval_Median_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Median_mean_accuracy_cv_rf)
## [1] 0.5428752
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Median_rf_trainAccuracy<-train_accuracy
print(FeatEval_Median_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Median_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Median_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 19 7 12
## Dementia 0 0 0
## MCI 47 21 87
##
## Overall Statistics
##
## Accuracy : 0.5492
## 95% CI : (0.4761, 0.6208)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 0.1747
##
## Kappa : 0.1343
##
## Mcnemar's Test P-Value : 1.465e-10
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.28788 0.0000 0.8788
## Specificity 0.85039 1.0000 0.2766
## Pos Pred Value 0.50000 NaN 0.5613
## Neg Pred Value 0.69677 0.8549 0.6842
## Prevalence 0.34197 0.1451 0.5130
## Detection Rate 0.09845 0.0000 0.4508
## Detection Prevalence 0.19689 0.0000 0.8031
## Balanced Accuracy 0.56914 0.5000 0.5777
cm_FeatEval_Median_rf_Accuracy<-cm_FeatEval_Median_rf$overall["Accuracy"]
print(cm_FeatEval_Median_rf_Accuracy)
## Accuracy
## 0.5492228
cm_FeatEval_Median_rf_Kappa<-cm_FeatEval_Median_rf$overall["Kappa"]
print(cm_FeatEval_Median_rf_Kappa)
## Kappa
## 0.134306
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## cg15501526 76.69 35.05 100.00
## age.now 47.72 45.85 76.28
## cg01153376 25.95 44.39 66.76
## cg06864789 30.44 62.72 14.64
## cg25259265 21.71 53.34 44.55
## cg12279734 53.34 46.14 13.15
## cg00962106 31.82 26.72 47.89
## cg15775217 47.85 37.29 23.55
## cg00247094 13.93 46.90 30.08
## cg09584650 29.42 46.70 31.56
## cg20685672 44.88 21.77 22.14
## cg07028768 24.85 13.81 44.41
## cg14564293 43.75 37.11 41.38
## cg05096415 29.33 42.98 28.25
## cg20507276 20.97 42.88 35.73
## cg16652920 23.84 12.07 42.66
## cg01128042 29.81 15.40 42.37
## cg05234269 29.99 41.57 36.94
## cg01667144 28.92 28.13 41.52
## cg26069044 20.75 31.78 41.44
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 76.6923099 35.053973 100.000000 cg15501526 100.00000
## 2 47.7155528 45.846153 76.278518 age.now 76.27852
## 3 25.9524160 44.393868 66.757310 cg01153376 66.75731
## 4 30.4368197 62.721527 14.644304 cg06864789 62.72153
## 5 21.7118768 53.342304 44.546949 cg25259265 53.34230
## 6 53.3366214 46.142628 13.150659 cg12279734 53.33662
## 7 31.8202922 26.717263 47.889227 cg00962106 47.88923
## 8 47.8486824 37.288439 23.548381 cg15775217 47.84868
## 9 13.9339856 46.900622 30.076314 cg00247094 46.90062
## 10 29.4219740 46.695776 31.560213 cg09584650 46.69578
## 11 44.8826960 21.768550 22.141293 cg20685672 44.88270
## 12 24.8468309 13.809502 44.406037 cg07028768 44.40604
## 13 43.7453305 37.110377 41.381288 cg14564293 43.74533
## 14 29.3320840 42.982247 28.245622 cg05096415 42.98225
## 15 20.9688504 42.882887 35.730460 cg20507276 42.88289
## 16 23.8446279 12.073446 42.661131 cg16652920 42.66113
## 17 29.8068016 15.399429 42.366636 cg01128042 42.36664
## 18 29.9885559 41.570634 36.937020 cg05234269 41.57063
## 19 28.9233442 28.131882 41.518101 cg01667144 41.51810
## 20 20.7459853 31.776151 41.443458 cg26069044 41.44346
## 21 27.6293426 40.871386 15.045489 cg01013522 40.87139
## 22 40.6837482 31.142879 24.998680 cg10240127 40.68375
## 23 14.7615919 40.620971 17.390796 cg09854620 40.62097
## 24 21.4395947 21.298022 40.483537 cg14240646 40.48354
## 25 40.4585390 28.565076 31.726655 cg11247378 40.45854
## 26 35.2263588 25.221218 40.430931 cg08857872 40.43093
## 27 29.5806004 21.639614 40.240150 cg17429539 40.24015
## 28 39.7359421 28.389481 31.332308 cg12776173 39.73594
## 29 14.5771133 26.625146 39.487658 cg14293999 39.48766
## 30 39.4502276 24.111916 29.108735 cg14710850 39.45023
## 31 7.0626094 39.019639 15.476174 cg16749614 39.01964
## 32 29.6583517 24.416012 38.897490 cg17738613 38.89749
## 33 27.9399920 38.653751 25.501184 cg07138269 38.65375
## 34 32.7092510 9.886174 38.523214 cg11187460 38.52321
## 35 38.4024400 32.177953 27.004240 cg10369879 38.40244
## 36 20.5454312 38.172773 38.261766 cg00154902 38.26177
## 37 13.8058741 38.207466 23.172401 cg19503462 38.20747
## 38 38.0187628 34.050182 17.865403 cg12228670 38.01876
## 39 37.2993793 35.104079 20.820271 cg01413796 37.29938
## 40 37.2767231 35.343056 29.001598 cg24861747 37.27672
## 41 20.5617528 37.263527 22.032914 cg04664583 37.26353
## 42 36.7092256 37.247823 11.616416 cg18819889 37.24782
## 43 26.2452981 25.114644 37.174705 cg26948066 37.17471
## 44 24.7888263 14.067769 37.079181 cg25879395 37.07918
## 45 27.9642639 36.766799 21.481868 cg25561557 36.76680
## 46 35.1299664 24.157696 36.742176 cg01921484 36.74218
## 47 31.9929330 27.438995 36.569860 cg02981548 36.56986
## 48 36.5546142 26.163146 33.587987 cg02225060 36.55461
## 49 12.1852932 18.781119 36.230248 cg15865722 36.23025
## 50 17.5108449 36.223769 22.633305 cg20913114 36.22377
## 51 11.8445071 36.205398 27.078280 cg12012426 36.20540
## 52 32.3957712 36.159733 29.995395 cg03084184 36.15973
## 53 32.6369694 15.770925 36.080998 cg10738648 36.08100
## 54 24.5945447 35.678602 35.999158 cg08861434 35.99916
## 55 35.9235758 22.004695 10.914928 cg03982462 35.92358
## 56 35.7889401 24.567638 24.186577 cg12146221 35.78894
## 57 26.9848456 35.383394 35.773419 cg27086157 35.77342
## 58 28.3767333 35.737187 32.778373 cg23161429 35.73719
## 59 35.5530586 28.523193 24.332180 cg05321907 35.55306
## 60 24.9052663 26.844921 35.358064 cg02621446 35.35806
## 61 22.8174056 31.668731 35.120230 PC1 35.12023
## 62 10.2440923 35.100606 26.696785 cg24873924 35.10061
## 63 10.1390641 24.645613 35.063982 cg02320265 35.06398
## 64 12.2961094 30.229128 35.017946 cg04248279 35.01795
## 65 19.1118773 34.658779 22.591672 cg00675157 34.65878
## 66 28.0565230 34.535561 0.000000 cg18339359 34.53556
## 67 24.3162728 34.507079 24.207233 cg00999469 34.50708
## 68 25.3401062 34.504523 23.477398 cg12534577 34.50452
## 69 34.4680648 19.685799 30.279707 PC2 34.46806
## 70 32.1984313 20.003301 34.347849 cg04412904 34.34785
## 71 34.2379960 33.891448 21.134538 cg02372404 34.23800
## 72 9.5926522 33.670583 19.463676 cg26474732 33.67058
## 73 29.7836802 24.996509 33.513906 cg15535896 33.51391
## 74 33.5021361 25.168032 21.381426 cg20139683 33.50214
## 75 33.4553005 23.403492 23.752550 cg13885788 33.45530
## 76 25.7402623 15.661728 33.435992 cg00696044 33.43599
## 77 24.5527336 24.506876 33.301503 cg18821122 33.30150
## 78 12.9207342 31.629984 33.293584 cg16771215 33.29358
## 79 19.3733484 27.138709 33.027811 cg17186592 33.02781
## 80 21.6900173 24.195451 32.649723 cg12738248 32.64972
## 81 24.4534514 26.529520 32.649513 cg23658987 32.64951
## 82 19.0738941 32.490892 18.472594 cg22274273 32.49089
## 83 15.9237341 14.289192 32.343544 cg25758034 32.34354
## 84 22.2692762 27.425613 31.747802 cg03327352 31.74780
## 85 26.6026298 18.823974 31.730728 cg11133939 31.73073
## 86 25.2898340 31.659280 23.256775 cg21209485 31.65928
## 87 28.2206526 31.654303 15.631104 cg02356645 31.65430
## 88 25.2545537 31.392922 17.764957 cg06536614 31.39292
## 89 19.4441777 25.193139 31.195570 cg24851651 31.19557
## 90 11.1124388 31.194517 26.448398 cg17906851 31.19452
## 91 6.8616647 27.121857 31.152981 cg06112204 31.15298
## 92 23.4993443 31.125283 19.681406 cg14527649 31.12528
## 93 22.9552695 30.421863 20.186686 cg26219488 30.42186
## 94 5.7595342 30.332789 23.060533 cg12284872 30.33279
## 95 18.2063080 30.067965 28.767916 cg03088219 30.06797
## 96 17.3967089 22.829945 30.016690 cg02494911 30.01669
## 97 12.7485088 29.958409 26.526103 cg12682323 29.95841
## 98 29.7889960 28.863462 27.932902 cg03071582 29.78900
## 99 26.4068755 29.568590 28.378530 cg10985055 29.56859
## 100 3.2101704 29.204393 14.095890 cg03129555 29.20439
## 101 29.0200111 15.549608 20.682407 cg00616572 29.02001
## 102 24.5139331 28.841978 28.902186 cg27341708 28.90219
## 103 19.8888388 7.102368 28.754540 cg15633912 28.75454
## 104 20.4842109 13.293009 28.568325 cg02932958 28.56832
## 105 28.4052159 8.971811 6.016374 cg06378561 28.40522
## 106 21.9280701 22.629981 28.311482 cg19377607 28.31148
## 107 17.1581799 28.298635 11.501730 cg11227702 28.29864
## 108 28.2398654 27.256889 19.929494 cg01680303 28.23987
## 109 27.9819786 12.035818 22.682467 cg06118351 27.98198
## 110 19.0419535 26.725237 27.738234 cg08198851 27.73823
## 111 13.7901367 26.104324 27.732541 PC3 27.73254
## 112 25.6708255 14.569499 27.706092 cg16715186 27.70609
## 113 14.5779068 27.692561 3.153006 cg00272795 27.69256
## 114 16.3139465 7.701157 27.652643 cg12784167 27.65264
## 115 0.8957984 27.592997 23.884488 cg14924512 27.59300
## 116 16.2600518 27.526059 26.359399 cg20678988 27.52606
## 117 22.2284073 27.513187 23.300478 cg23916408 27.51319
## 118 11.5655941 20.028505 27.458140 cg08779649 27.45814
## 119 27.2848766 15.314137 23.359170 cg07152869 27.28488
## 120 27.1149265 26.976261 19.540706 cg03660162 27.11493
## 121 19.7029012 22.955827 27.105937 cg19512141 27.10594
## 122 21.4209614 14.788613 27.004950 cg26757229 27.00495
## 123 22.0683430 7.923935 26.973580 cg00689685 26.97358
## 124 20.7773065 26.771188 25.171435 cg06950937 26.77119
## 125 20.7988609 26.347107 18.386969 cg06715136 26.34711
## 126 14.6761704 14.707912 26.314149 cg27272246 26.31415
## 127 7.6524253 26.276744 18.646134 cg24859648 26.27674
## 128 20.0815165 19.090487 26.217279 cg01933473 26.21728
## 129 15.8324823 16.951754 25.930281 cg21697769 25.93028
## 130 11.8201937 25.891902 23.329712 cg10750306 25.89190
## 131 25.7809473 23.550684 25.216088 cg23432430 25.78095
## 132 16.6259933 25.748940 25.747717 cg07523188 25.74894
## 133 9.0579625 25.661118 22.882843 cg07480176 25.66112
## 134 25.4265285 11.601544 13.411877 cg06697310 25.42653
## 135 21.6110435 25.155535 25.060500 cg27577781 25.15554
## 136 24.5023451 21.019052 16.860391 cg20370184 24.50235
## 137 10.1681090 14.831455 24.502091 cg16788319 24.50209
## 138 16.5321968 24.469138 12.302947 cg08584917 24.46914
## 139 24.3277743 11.685925 20.143261 cg05570109 24.32777
## 140 23.2146100 18.394328 7.510325 cg27452255 23.21461
## 141 21.2926607 23.057160 19.489054 cg24506579 23.05716
## 142 13.6215030 22.995195 18.351225 cg01549082 22.99519
## 143 22.9359146 20.481779 20.336097 cg11331837 22.93591
## 144 15.8523535 22.408087 12.218595 cg21854924 22.40809
## 145 22.2658369 4.384511 20.090717 cg11438323 22.26584
## 146 19.1366173 17.664706 21.810614 cg17421046 21.81061
## 147 21.0227393 18.873605 11.750042 cg16178271 21.02274
## 148 19.9206416 19.354994 18.389093 cg12466610 19.92064
## 149 18.3038153 19.892505 18.637289 cg05841700 19.89250
## 150 6.8288286 19.478153 17.614024 cg14307563 19.47815
## 151 19.2457721 13.327575 13.384317 cg25436480 19.24577
## 152 16.1924810 19.024224 18.409134 cg00322003 19.02422
## 153 10.2046806 17.707978 17.590101 cg13080267 17.70798
## 154 7.9240213 17.529300 9.739789 cg27639199 17.52930
## 155 13.7437050 16.592669 15.115893 cg16579946 16.59267
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 76.69231 35.05397 100.00000 cg15501526 100.00000
## 2 47.71555 45.84615 76.27852 age.now 76.27852
## 3 25.95242 44.39387 66.75731 cg01153376 66.75731
## 4 30.43682 62.72153 14.64430 cg06864789 62.72153
## 5 21.71188 53.34230 44.54695 cg25259265 53.34230
## 6 53.33662 46.14263 13.15066 cg12279734 53.33662
## 7 31.82029 26.71726 47.88923 cg00962106 47.88923
## 8 47.84868 37.28844 23.54838 cg15775217 47.84868
## 9 13.93399 46.90062 30.07631 cg00247094 46.90062
## 10 29.42197 46.69578 31.56021 cg09584650 46.69578
## 11 44.88270 21.76855 22.14129 cg20685672 44.88270
## 12 24.84683 13.80950 44.40604 cg07028768 44.40604
## 13 43.74533 37.11038 41.38129 cg14564293 43.74533
## 14 29.33208 42.98225 28.24562 cg05096415 42.98225
## 15 20.96885 42.88289 35.73046 cg20507276 42.88289
## 16 23.84463 12.07345 42.66113 cg16652920 42.66113
## 17 29.80680 15.39943 42.36664 cg01128042 42.36664
## 18 29.98856 41.57063 36.93702 cg05234269 41.57063
## 19 28.92334 28.13188 41.51810 cg01667144 41.51810
## 20 20.74599 31.77615 41.44346 cg26069044 41.44346
## [1] "the top 20 features based on max way:"
## [1] "cg15501526" "age.now" "cg01153376" "cg06864789" "cg25259265" "cg12279734" "cg00962106"
## [8] "cg15775217" "cg00247094" "cg09584650" "cg20685672" "cg07028768" "cg14564293" "cg05096415"
## [15] "cg20507276" "cg16652920" "cg01128042" "cg05234269" "cg01667144" "cg26069044"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6969
## The AUC value for class CN is: 0.6968504
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.5867
## The AUC value for class Dementia is: 0.5866883
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.641
## The AUC value for class MCI is: 0.641038
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_rf_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6415256
print(FeatEval_Median_rf_AUC)
## [1] 0.6415256
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 364, 364, 364, 365
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.6725089 0.4667618
## 0.50 0.6769045 0.4687381
## 1.00 0.6813012 0.4718549
##
## Tuning parameter 'sigma' was held constant at a value of 0.003284607
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003284607 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.003284607 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.6769049
FeatEval_Median_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Median_mean_accuracy_cv_svm)
## [1] 0.6769049
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.953846153846154"
FeatEval_Median_svm_trainAccuracy <- train_accuracy
print(FeatEval_Median_svm_trainAccuracy)
## [1] 0.9538462
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Median_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Median_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 44 3 12
## Dementia 5 17 4
## MCI 17 8 83
##
## Overall Statistics
##
## Accuracy : 0.7461
## 95% CI : (0.6786, 0.8059)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.736e-11
##
## Kappa : 0.5689
##
## Mcnemar's Test P-Value : 0.441
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6667 0.60714 0.8384
## Specificity 0.8819 0.94545 0.7340
## Pos Pred Value 0.7458 0.65385 0.7685
## Neg Pred Value 0.8358 0.93413 0.8118
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2280 0.08808 0.4301
## Detection Prevalence 0.3057 0.13472 0.5596
## Balanced Accuracy 0.7743 0.77630 0.7862
cm_FeatEval_Median_svm_Accuracy <- cm_FeatEval_Median_svm$overall["Accuracy"]
cm_FeatEval_Median_svm_Kappa <- cm_FeatEval_Median_svm$overall["Kappa"]
print(cm_FeatEval_Median_svm_Accuracy)
## Accuracy
## 0.746114
print(cm_FeatEval_Median_svm_Kappa)
## Kappa
## 0.5688625
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg05234269 1.028571 1.100000 1.125714 0.1188272
## 2 cg24851651 1.057143 1.100000 1.122857 0.1188272
## 3 cg04248279 1.060000 1.085714 1.085714 0.1172840
## 4 PC1 1.034286 1.071429 1.105714 0.1157407
## 5 cg02225060 1.028571 1.071429 1.111429 0.1157407
## 6 cg11133939 1.057143 1.071429 1.097143 0.1157407
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.5389
## The AUC value for class CN is: 0.5388929
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) > 28 cases (binary_labels 1).
## Area under the curve: 0.5162
## The AUC value for class Dementia is: 0.5162338
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) > 99 cases (binary_labels 1).
## Area under the curve: 0.5044
## The AUC value for class MCI is: 0.5044058
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_svm_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.5198441
print(FeatEval_Median_svm_AUC )
## [1] 0.5198441
Performance of the selected output features based on Frequency
processed_dataFrame<-df_process_Output_freq
processed_data<-output_Frequency_Feature
AfterProcess_FeatureName<-df_process_frequency_FeatureName
print(head(output_Frequency_Feature))
## # A tibble: 6 × 156
## DX PC1 PC2 PC3 cg00962106 cg02225060 cg14710850 cg27452255 cg02981548
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI -0.214 0.0147 -0.0140 0.912 0.683 0.805 0.900 0.134
## 2 CN -0.173 0.0575 0.00506 0.538 0.827 0.809 0.659 0.522
## 3 CN -0.00367 0.0837 0.0291 0.504 0.521 0.829 0.901 0.510
## 4 Dementia -0.187 -0.0112 -0.0323 0.904 0.808 0.834 0.890 0.566
## 5 MCI 0.0268 0.0000165 0.0529 0.896 0.608 0.850 0.578 0.568
## 6 CN -0.0379 0.0157 -0.00869 0.886 0.764 0.821 0.881 0.508
## # ℹ 147 more variables: cg08861434 <dbl>, cg19503462 <dbl>, cg07152869 <dbl>, cg16749614 <dbl>,
## # cg05096415 <dbl>, cg23432430 <dbl>, cg17186592 <dbl>, cg00247094 <dbl>, cg09584650 <dbl>,
## # cg11133939 <dbl>, cg16715186 <dbl>, cg03129555 <dbl>, cg08857872 <dbl>, cg06864789 <dbl>,
## # cg14924512 <dbl>, cg16652920 <dbl>, cg03084184 <dbl>, cg26219488 <dbl>, cg20913114 <dbl>,
## # cg06378561 <dbl>, cg26948066 <dbl>, cg25259265 <dbl>, cg06536614 <dbl>, cg24859648 <dbl>,
## # cg12279734 <dbl>, cg03982462 <dbl>, cg05841700 <dbl>, cg11227702 <dbl>, cg12146221 <dbl>,
## # cg02621446 <dbl>, cg00616572 <dbl>, cg15535896 <dbl>, cg02372404 <dbl>, cg09854620 <dbl>, …
print(df_process_frequency_FeatureName)
## [1] "PC1" "PC2" "PC3" "cg00962106" "cg02225060" "cg14710850" "cg27452255"
## [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555" "cg08857872"
## [22] "cg06864789" "cg14924512" "cg16652920" "cg03084184" "cg26219488" "cg20913114" "cg06378561"
## [29] "cg26948066" "cg25259265" "cg06536614" "cg24859648" "cg12279734" "cg03982462" "cg05841700"
## [36] "cg11227702" "cg12146221" "cg02621446" "cg00616572" "cg15535896" "cg02372404" "cg09854620"
## [43] "cg04248279" "cg20678988" "cg24861747" "cg10240127" "cg16771215" "cg01667144" "cg13080267"
## [50] "cg02494911" "cg10750306" "cg11438323" "cg06715136" "cg04412904" "cg12738248" "cg03071582"
## [57] "cg05570109" "cg15775217" "cg24873924" "cg17738613" "cg01921484" "cg10369879" "cg27341708"
## [64] "cg12534577" "cg18821122" "cg12682323" "cg05234269" "cg20685672" "cg12228670" "cg11331837"
## [71] "cg01680303" "cg17421046" "cg03088219" "cg02356645" "cg00322003" "cg01013522" "cg00272795"
## [78] "cg25758034" "cg26474732" "cg16579946" "cg07523188" "cg11187460" "cg14527649" "cg20370184"
## [85] "cg17429539" "cg20507276" "cg13885788" "cg16178271" "cg10738648" "cg26069044" "cg25879395"
## [92] "cg06112204" "cg23161429" "cg25436480" "cg26757229" "cg02932958" "cg18339359" "cg23916408"
## [99] "cg06950937" "cg12784167" "cg07480176" "cg15865722" "cg27577781" "cg05321907" "cg03660162"
## [106] "cg07138269" "cg20139683" "cg12284872" "cg03327352" "cg23658987" "cg21854924" "cg21697769"
## [113] "cg19512141" "cg08198851" "cg00675157" "cg01153376" "cg01933473" "cg12776173" "cg14564293"
## [120] "cg24851651" "cg22274273" "cg25561557" "cg21209485" "cg10985055" "cg14293999" "cg18819889"
## [127] "cg24506579" "cg19377607" "cg06697310" "cg00696044" "cg01549082" "cg01128042" "cg00999469"
## [134] "cg06118351" "cg12012426" "cg08584917" "cg27272246" "cg15633912" "cg16788319" "cg17906851"
## [141] "cg07028768" "cg27086157" "cg14240646" "cg00154902" "cg14307563" "cg02320265" "cg08779649"
## [148] "cg04664583" "cg12466610" "cg27639199" "cg15501526" "cg00689685" "cg01413796" "cg11247378"
## [155] "age.now"
print(length(df_process_frequency_FeatureName))
## [1] 155
Num_KeyFea_Frequency <- length(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
## DX PC1 PC2 PC3 cg00962106 cg02225060
## 200223270003_R02C01 MCI -0.214185447 1.470293e-02 -0.014043316 0.9124898 0.6828159
## 200223270003_R03C01 CN -0.172761185 5.745834e-02 0.005055871 0.5375751 0.8265195
## 200223270003_R06C01 CN -0.003667305 8.372861e-02 0.029143653 0.5040948 0.5209552
## 200223270003_R07C01 Dementia -0.186779607 -1.117250e-02 -0.032302430 0.9039029 0.8078889
## 200223270006_R01C01 MCI 0.026814649 1.650735e-05 0.052947950 0.8961556 0.6084903
## 200223270006_R04C01 CN -0.037862929 1.571950e-02 -0.008685676 0.8857597 0.7638781
## cg14710850 cg27452255 cg02981548 cg08861434 cg19503462 cg07152869
## 200223270003_R02C01 0.8048592 0.9001010 0.1342571 0.8768306 0.7951675 0.8284151
## 200223270003_R03C01 0.8090950 0.6593379 0.5220037 0.4352647 0.4537684 0.5050630
## 200223270003_R06C01 0.8285902 0.9012217 0.5098965 0.8698813 0.6997359 0.8352490
## 200223270003_R07C01 0.8336457 0.8898635 0.5660985 0.4709249 0.7189778 0.5194300
## 200223270006_R01C01 0.8500725 0.5779792 0.5678714 0.8618532 0.7301755 0.5025709
## 200223270006_R04C01 0.8207247 0.8809143 0.5079859 0.9058965 0.4207207 0.8080916
## cg16749614 cg05096415 cg23432430 cg17186592 cg00247094 cg09584650
## 200223270003_R02C01 0.8678741 0.9182527 0.9482702 0.9230463 0.5399349 0.08230254
## 200223270003_R03C01 0.8539348 0.5177819 0.9455418 0.8593448 0.9315640 0.09661586
## 200223270003_R06C01 0.5874127 0.6288426 0.9418716 0.8467599 0.5177874 0.52399749
## 200223270003_R07C01 0.5555391 0.6060271 0.9426559 0.4986373 0.5377765 0.11587211
## 200223270006_R01C01 0.8026346 0.5599588 0.9461736 0.8978999 0.9109309 0.42115185
## 200223270006_R04C01 0.7903978 0.5441200 0.9508404 0.9239750 0.5266535 0.56043178
## cg11133939 cg16715186 cg03129555 cg08857872 cg06864789 cg14924512
## 200223270003_R02C01 0.1282694 0.2742789 0.6079616 0.3395280 0.05369415 0.5303907
## 200223270003_R03C01 0.5920898 0.7946153 0.5785498 0.8181845 0.46053125 0.9160885
## 200223270003_R06C01 0.5127706 0.8124316 0.9137818 0.2970779 0.87513655 0.9088414
## 200223270003_R07C01 0.8474176 0.7773263 0.9043041 0.2954090 0.49020327 0.9081681
## 200223270006_R01C01 0.8589133 0.8334531 0.9286357 0.8935876 0.47852685 0.9111789
## 200223270006_R04C01 0.5246557 0.8039945 0.9088564 0.8901338 0.05423587 0.5331753
## cg16652920 cg03084184 cg26219488 cg20913114 cg06378561 cg26948066
## 200223270003_R02C01 0.9436000 0.8162981 0.9336638 0.36510482 0.9389306 0.4685225
## 200223270003_R03C01 0.9431222 0.7877128 0.9134707 0.80382984 0.9377503 0.5026045
## 200223270003_R06C01 0.9457161 0.4546397 0.9261878 0.03158439 0.5154019 0.9101976
## 200223270003_R07C01 0.9419785 0.7812413 0.9217866 0.81256840 0.9403569 0.9379543
## 200223270006_R01C01 0.9529417 0.7818230 0.4929692 0.81502059 0.4956816 0.9120181
## 200223270006_R04C01 0.9492648 0.7725853 0.9431574 0.90468830 0.9268832 0.8868608
## cg25259265 cg06536614 cg24859648 cg12279734 cg03982462 cg05841700
## 200223270003_R02C01 0.4356646 0.5824474 0.83777536 0.6435368 0.8562777 0.2923544
## 200223270003_R03C01 0.8893591 0.5746694 0.44392797 0.1494651 0.6023731 0.9146488
## 200223270003_R06C01 0.4201700 0.5773468 0.03341185 0.8760759 0.8778458 0.3737990
## 200223270003_R07C01 0.4455517 0.5848917 0.43582347 0.8674214 0.8860227 0.5046468
## 200223270006_R01C01 0.8423337 0.5669919 0.03087161 0.6454450 0.8703107 0.8419031
## 200223270006_R04C01 0.8460736 0.5718514 0.02588024 0.8660058 0.8792860 0.9286652
## cg11227702 cg12146221 cg02621446 cg00616572 cg15535896 cg02372404
## 200223270003_R02C01 0.86486075 0.2049284 0.8731313 0.9335067 0.3382952 0.03598249
## 200223270003_R03C01 0.49184121 0.1814927 0.8095534 0.9214079 0.9253926 0.02767285
## 200223270003_R06C01 0.02543724 0.8619250 0.7511582 0.9113633 0.3320191 0.03127855
## 200223270003_R07C01 0.45150971 0.1238469 0.8773609 0.9160238 0.9409104 0.55685785
## 200223270006_R01C01 0.89086877 0.2021598 0.2046541 0.4861334 0.9326027 0.02587736
## 200223270006_R04C01 0.87675947 0.1383786 0.7963817 0.9067928 0.9156401 0.02828648
## cg09854620 cg04248279 cg20678988 cg24861747 cg10240127 cg16771215
## 200223270003_R02C01 0.5220587 0.8534976 0.8438718 0.3540897 0.9250553 0.88389723
## 200223270003_R03C01 0.8739646 0.8458854 0.8548886 0.4309505 0.9403255 0.07196933
## 200223270003_R06C01 0.8973149 0.8332786 0.7786685 0.8071462 0.9056974 0.09949974
## 200223270003_R07C01 0.8958863 0.3303204 0.8260541 0.3347317 0.9396217 0.64234023
## 200223270006_R01C01 0.9075331 0.5966878 0.3295384 0.3544795 0.9262370 0.62679274
## 200223270006_R04C01 0.9318820 0.8939599 0.8541667 0.5997840 0.9240497 0.06970175
## cg01667144 cg13080267 cg02494911 cg10750306 cg11438323 cg06715136
## 200223270003_R02C01 0.8971484 0.78936656 0.3049435 0.04919915 0.4863471 0.3400192
## 200223270003_R03C01 0.3175389 0.78371483 0.2416332 0.55160081 0.8984559 0.9259109
## 200223270003_R06C01 0.9238364 0.09436069 0.2520909 0.54694332 0.8722772 0.9079807
## 200223270003_R07C01 0.8739442 0.09351259 0.2457032 0.59824543 0.5026756 0.6782105
## 200223270006_R01C01 0.2931961 0.45173796 0.8045030 0.53158639 0.8809646 0.8369052
## 200223270006_R04C01 0.8616530 0.49866715 0.7489283 0.05646838 0.8717937 0.8807568
## cg04412904 cg12738248 cg03071582 cg05570109 cg15775217 cg24873924
## 200223270003_R02C01 0.05088595 0.85430866 0.9187811 0.3466611 0.5707441 0.3060635
## 200223270003_R03C01 0.07717659 0.88010292 0.5844421 0.5866750 0.9168327 0.8640985
## 200223270003_R06C01 0.08253743 0.51121855 0.6245558 0.4046471 0.6042521 0.8259149
## 200223270003_R07C01 0.06217431 0.09131476 0.9283683 0.6014355 0.9062231 0.8333940
## 200223270006_R01C01 0.11888769 0.91529345 0.5715416 0.5774881 0.9083515 0.8761177
## 200223270006_R04C01 0.08885846 0.91911405 0.6534650 0.8756826 0.6383270 0.8585363
## cg17738613 cg01921484 cg10369879 cg27341708 cg12534577 cg18821122
## 200223270003_R02C01 0.6879612 0.90985496 0.9218784 0.48846610 0.8585231 0.9291309
## 200223270003_R03C01 0.6582258 0.90931369 0.3149306 0.02613847 0.8493466 0.5901603
## 200223270003_R06C01 0.1022257 0.92044873 0.9141081 0.86893582 0.8395241 0.5779620
## 200223270003_R07C01 0.8960156 0.91674311 0.9054415 0.02642300 0.8511384 0.9251431
## 200223270006_R01C01 0.8850702 0.02943747 0.2917862 0.47573455 0.8804655 0.9217018
## 200223270006_R04C01 0.8481916 0.89057041 0.9200403 0.89411974 0.3029013 0.5412250
## cg12682323 cg05234269 cg20685672 cg12228670 cg11331837 cg01680303
## 200223270003_R02C01 0.9397956 0.93848584 0.67121006 0.8632174 0.03692842 0.5095174
## 200223270003_R03C01 0.9003940 0.57461229 0.79320906 0.8496212 0.57150125 0.1344941
## 200223270003_R06C01 0.9157877 0.02467208 0.66136456 0.8738949 0.03182862 0.7573869
## 200223270003_R07C01 0.9048877 0.56516794 0.80838304 0.8362189 0.03832164 0.4772204
## 200223270006_R01C01 0.1065347 0.94829529 0.08291414 0.8079694 0.93008298 0.1176263
## 200223270006_R04C01 0.8836232 0.56298286 0.84460055 0.6966666 0.54004452 0.5133033
## cg17421046 cg03088219 cg02356645 cg00322003 cg01013522 cg00272795
## 200223270003_R02C01 0.9026993 0.844002862 0.5105903 0.1759911 0.6251168 0.46365138
## 200223270003_R03C01 0.9112100 0.007435243 0.5833923 0.5702070 0.8862821 0.82839260
## 200223270003_R06C01 0.8952031 0.120155222 0.5701428 0.3077122 0.5425308 0.07231279
## 200223270003_R07C01 0.9268852 0.826554308 0.5683381 0.6104341 0.8429862 0.78303831
## 200223270006_R01C01 0.1118337 0.066294915 0.5233692 0.6147419 0.0480531 0.78219952
## 200223270006_R04C01 0.4174370 0.574738383 0.9188670 0.2293759 0.8240222 0.44408249
## cg25758034 cg26474732 cg16579946 cg07523188 cg11187460 cg14527649
## 200223270003_R02C01 0.6114028 0.7843252 0.6306315 0.7509183 0.03672179 0.2678912
## 200223270003_R03C01 0.6649219 0.8184088 0.6648766 0.1524386 0.92516409 0.7954683
## 200223270003_R06C01 0.2393844 0.7358417 0.6455081 0.7127592 0.03109553 0.8350610
## 200223270003_R07C01 0.7071501 0.7509296 0.8979650 0.8464983 0.53283119 0.8428684
## 200223270006_R01C01 0.2301078 0.8294938 0.6886498 0.7847738 0.54038146 0.8231348
## 200223270006_R04C01 0.6891513 0.8033167 0.6766907 0.8231277 0.91096169 0.8022444
## cg20370184 cg17429539 cg20507276 cg13885788 cg16178271 cg10738648
## 200223270003_R02C01 0.37710950 0.7860900 0.12238910 0.9380618 0.6445416 0.44931577
## 200223270003_R03C01 0.05737964 0.7100923 0.38721972 0.9369476 0.6178075 0.49894016
## 200223270003_R06C01 0.04740505 0.7660838 0.47978438 0.5163017 0.6641952 0.05552024
## 200223270003_R07C01 0.83572095 0.6984969 0.02261996 0.9183376 0.7148058 0.03730440
## 200223270006_R01C01 0.04056608 0.6508597 0.37465798 0.5525542 0.6138954 0.54952781
## 200223270006_R04C01 0.04038589 0.2828452 0.03570795 0.9328289 0.9414188 0.59358167
## cg26069044 cg25879395 cg06112204 cg23161429 cg25436480 cg26757229
## 200223270003_R02C01 0.92401867 0.88130864 0.5251592 0.8956965 0.84251599 0.6723726
## 200223270003_R03C01 0.94072227 0.02603438 0.8773488 0.9099619 0.49940321 0.1422661
## 200223270003_R06C01 0.93321315 0.91060615 0.8867975 0.8833895 0.34943119 0.7933794
## 200223270003_R07C01 0.56567694 0.89205942 0.5613799 0.9134709 0.85244913 0.8074830
## 200223270006_R01C01 0.94369927 0.47886249 0.9184122 0.8738558 0.44545117 0.5265692
## 200223270006_R04C01 0.02040391 0.02145248 0.9152514 0.9104210 0.02575036 0.7341953
## cg02932958 cg18339359 cg23916408 cg06950937 cg12784167 cg07480176
## 200223270003_R02C01 0.7901008 0.8824858 0.1942275 0.8910968 0.81503498 0.5171664
## 200223270003_R03C01 0.4210489 0.9040272 0.9154993 0.2889345 0.02811410 0.3760452
## 200223270003_R06C01 0.3825995 0.8552121 0.8886255 0.9143801 0.03073269 0.6998389
## 200223270003_R07C01 0.7617081 0.3073106 0.8872447 0.8891079 0.84775699 0.2189042
## 200223270006_R01C01 0.8431126 0.8973742 0.2219945 0.8868617 0.83825789 0.5570021
## 200223270006_R04C01 0.7610084 0.2292800 0.1520624 0.9093273 0.45475291 0.4501196
## cg15865722 cg27577781 cg05321907 cg03660162 cg07138269 cg20139683
## 200223270003_R02C01 0.89438595 0.8143535 0.2880477 0.8691767 0.5002290 0.8717075
## 200223270003_R03C01 0.90194372 0.8113185 0.1782629 0.5160770 0.9426707 0.9059433
## 200223270003_R06C01 0.92118977 0.8144274 0.8427929 0.9026304 0.5057781 0.8962554
## 200223270003_R07C01 0.09230759 0.7970617 0.8320504 0.5305691 0.9400527 0.9218012
## 200223270006_R01C01 0.93422668 0.8640044 0.2422218 0.9257451 0.9321602 0.1708472
## 200223270006_R04C01 0.92220002 0.8840237 0.2429551 0.8935772 0.9333501 0.1067122
## cg12284872 cg03327352 cg23658987 cg21854924 cg21697769 cg19512141
## 200223270003_R02C01 0.8008333 0.8851712 0.79757644 0.8729132 0.8946108 0.8209161
## 200223270003_R03C01 0.7414569 0.8786878 0.07511718 0.7162342 0.2822953 0.7903543
## 200223270003_R06C01 0.7725267 0.3042310 0.10177571 0.7520990 0.8698740 0.8404684
## 200223270003_R07C01 0.7573369 0.8273211 0.46747992 0.8641284 0.9134887 0.2202759
## 200223270006_R01C01 0.7201607 0.8774082 0.76831297 0.6498895 0.2683820 0.8059589
## 200223270006_R04C01 0.8021446 0.8829492 0.08988532 0.5943113 0.2765740 0.7020247
## cg08198851 cg00675157 cg01153376 cg01933473 cg12776173 cg14564293
## 200223270003_R02C01 0.6578905 0.9188438 0.4872148 0.2589014 0.10388038 0.52089591
## 200223270003_R03C01 0.6578186 0.9242325 0.9639670 0.6726133 0.87306345 0.04000662
## 200223270003_R06C01 0.1272153 0.9254708 0.2242410 0.2642560 0.70094907 0.04959460
## 200223270003_R07C01 0.8351465 0.5447244 0.5155654 0.1978068 0.11367159 0.03114773
## 200223270006_R01C01 0.8791156 0.5173554 0.9588916 0.7599441 0.09458405 0.51703196
## 200223270006_R04C01 0.1423737 0.9247232 0.9586876 0.7405661 0.86532175 0.51535010
## cg24851651 cg22274273 cg25561557 cg21209485 cg10985055 cg14293999
## 200223270003_R02C01 0.03674702 0.4209386 0.76736369 0.8865053 0.8518169 0.2836710
## 200223270003_R03C01 0.05358297 0.4246379 0.03851635 0.8714878 0.8631895 0.9172023
## 200223270003_R06C01 0.05968923 0.4196796 0.47259480 0.2292550 0.5456633 0.9168166
## 200223270003_R07C01 0.60864179 0.4164100 0.43364249 0.2351526 0.8825100 0.9188336
## 200223270006_R01C01 0.08825834 0.7951105 0.46211439 0.8882046 0.8841690 0.1971116
## 200223270006_R04C01 0.91932068 0.0229810 0.44651530 0.2292483 0.8407797 0.9030919
## cg18819889 cg24506579 cg19377607 cg06697310 cg00696044 cg01549082
## 200223270003_R02C01 0.9156157 0.5244337 0.05377464 0.8454609 0.55608424 0.2924138
## 200223270003_R03C01 0.9004455 0.5794845 0.90570746 0.8653044 0.07552381 0.7065693
## 200223270003_R06C01 0.9054439 0.9427785 0.06636174 0.2405168 0.79270858 0.2895440
## 200223270003_R07C01 0.9089935 0.9323844 0.68788639 0.8479193 0.03548419 0.6422955
## 200223270006_R01C01 0.9065397 0.9185355 0.06338988 0.8206613 0.10714386 0.8471236
## 200223270006_R04C01 0.9242767 0.4332642 0.91551446 0.7839595 0.18420803 0.6949888
## cg01128042 cg00999469 cg06118351 cg12012426 cg08584917 cg27272246
## 200223270003_R02C01 0.9113420 0.3274080 0.36339400 0.9165048 0.5663205 0.8615873
## 200223270003_R03C01 0.5328806 0.2857719 0.47148604 0.9434768 0.9019732 0.8705287
## 200223270003_R06C01 0.5222757 0.2499229 0.86559618 0.9220044 0.9187789 0.8103777
## 200223270003_R07C01 0.5141721 0.2819622 0.83494303 0.9241284 0.6007449 0.0310881
## 200223270006_R01C01 0.9321215 0.2933539 0.02632111 0.9327143 0.9069098 0.7686536
## 200223270006_R04C01 0.5050081 0.2966623 0.83329300 0.9271167 0.9263584 0.4403542
## cg15633912 cg16788319 cg17906851 cg07028768 cg27086157 cg14240646
## 200223270003_R02C01 0.1605530 0.9379870 0.9488392 0.4496851 0.9224112 0.5391334
## 200223270003_R03C01 0.9333421 0.8913429 0.9529718 0.8536078 0.9219304 0.2538363
## 200223270003_R06C01 0.8737362 0.8680680 0.6462151 0.8356936 0.3224986 0.1864902
## 200223270003_R07C01 0.9137334 0.8811444 0.9553497 0.4245893 0.3455486 0.6402007
## 200223270006_R01C01 0.9169706 0.3123481 0.6222117 0.8835151 0.8988962 0.7696079
## 200223270006_R04C01 0.8890004 0.2995627 0.6441202 0.4514661 0.9159217 0.1490028
## cg00154902 cg14307563 cg02320265 cg08779649 cg04664583 cg12466610
## 200223270003_R02C01 0.5137741 0.1855966 0.8853213 0.44449401 0.5572814 0.05767659
## 200223270003_R03C01 0.8540746 0.8916957 0.4686314 0.45076825 0.5881190 0.59131778
## 200223270003_R06C01 0.8188126 0.8750052 0.4838749 0.04810217 0.9352717 0.06939623
## 200223270003_R07C01 0.4625776 0.8975663 0.8986848 0.42715969 0.9350230 0.04527733
## 200223270006_R01C01 0.4690086 0.8762842 0.8987560 0.89313476 0.9424588 0.05212904
## 200223270006_R04C01 0.4547219 0.9168614 0.4768520 0.59523771 0.9379537 0.05104033
## cg27639199 cg15501526 cg00689685 cg01413796 cg11247378 age.now
## 200223270003_R02C01 0.67515415 0.6362531 0.7019389 0.1345128 0.1591185 82.40000
## 200223270003_R03C01 0.67552763 0.6319253 0.8634268 0.2830672 0.7874849 78.60000
## 200223270003_R06C01 0.06233093 0.7435100 0.6378795 0.8194681 0.4807942 80.40000
## 200223270003_R07C01 0.05701332 0.7756577 0.8624541 0.9007710 0.4537348 78.16441
## 200223270006_R01C01 0.05037694 0.3230777 0.6361891 0.2603027 0.1537079 62.90000
## 200223270006_R04C01 0.08144161 0.8342695 0.6356260 0.9207672 0.1686356 80.67796
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 455 156
dim(testData)
## [1] 193 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Freq_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Freq_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 46 7 14
## Dementia 3 10 4
## MCI 17 11 81
##
## Overall Statistics
##
## Accuracy : 0.7098
## 95% CI : (0.6403, 0.7728)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.018e-08
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 0.1607
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6970 0.35714 0.8182
## Specificity 0.8346 0.95758 0.7021
## Pos Pred Value 0.6866 0.58824 0.7431
## Neg Pred Value 0.8413 0.89773 0.7857
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2383 0.05181 0.4197
## Detection Prevalence 0.3472 0.08808 0.5648
## Balanced Accuracy 0.7658 0.65736 0.7602
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Freq_LRM1_Accuracy <- cm_FeatEval_Freq_LRM1$overall["Accuracy"]
cm_FeatEval_Freq_LRM1_Kappa <- cm_FeatEval_Freq_LRM1$overall["Kappa"]
print(cm_FeatEval_Freq_LRM1_Accuracy)
## Accuracy
## 0.7098446
print(cm_FeatEval_Freq_LRM1_Kappa)
## Kappa
## 0.4987013
print(model_LRM1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001810831 0.6350263 0.3962356
## 0.10 0.0018108309 0.6460636 0.4102125
## 0.10 0.0181083090 0.6548792 0.4144240
## 0.55 0.0001810831 0.6285290 0.3793868
## 0.55 0.0018108309 0.6505792 0.4121576
## 0.55 0.0181083090 0.6483336 0.3870111
## 1.00 0.0001810831 0.6065010 0.3457739
## 1.00 0.0018108309 0.6394930 0.3907984
## 1.00 0.0181083090 0.5867925 0.2663062
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01810831.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Freq_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.96043956043956"
print(FeatEval_Freq_LRM1_trainAccuracy)
## [1] 0.9604396
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.6329108
FeatEval_Freq_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Freq_mean_accuracy_cv_LRM1)
## [1] 0.6329108
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8487
## The AUC value for class CN is: 0.8487235
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.831
## The AUC value for class Dementia is: 0.8309524
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8189
## The AUC value for class MCI is: 0.818934
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_LRM1_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.83287
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 90.421 1.000e+02 0.000
## PC2 46.588 7.877e+01 0.000
## PC3 5.926 0.000e+00 68.326
## cg00962106 63.062 1.184e+01 36.931
## cg02225060 23.012 1.264e+01 51.144
## cg14710850 49.617 8.388e+00 25.395
## cg27452255 49.059 1.788e+01 11.818
## cg02981548 26.229 5.642e+00 49.026
## cg08861434 48.679 0.000e+00 42.749
## cg19503462 25.912 4.811e+01 5.779
## cg07152869 27.983 4.673e+01 1.349
## cg16749614 11.546 1.796e+01 45.937
## cg05096415 1.408 4.491e+01 28.926
## cg23432430 44.231 3.504e+00 25.258
## cg17186592 3.091 4.201e+01 26.690
## cg00247094 15.880 4.167e+01 10.430
## cg09584650 41.416 6.519e+00 18.541
## cg11133939 24.211 4.137e-03 40.491
## cg16715186 39.196 7.696e+00 17.048
## cg03129555 12.455 3.861e+01 8.425
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 ||METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 90.4211914 1.000000e+02 0.00000000 PC1 100.0000000
## 2 46.5878880 7.876540e+01 0.00000000 PC2 78.7654000
## 3 5.9264601 0.000000e+00 68.32645216 PC3 68.3264522
## 4 63.0622141 1.184076e+01 36.93111422 cg00962106 63.0622141
## 5 23.0122858 1.263505e+01 51.14401263 cg02225060 51.1440126
## 6 49.6168920 8.388454e+00 25.39546443 cg14710850 49.6168920
## 7 49.0594808 1.787521e+01 11.81773067 cg27452255 49.0594808
## 8 26.2290394 5.642018e+00 49.02601867 cg02981548 49.0260187
## 9 48.6793071 0.000000e+00 42.74904997 cg08861434 48.6793071
## 10 25.9123492 4.811130e+01 5.77902211 cg19503462 48.1113013
## 11 27.9831242 4.672741e+01 1.34850115 cg07152869 46.7274142
## 12 11.5461891 1.795969e+01 45.93728159 cg16749614 45.9372816
## 13 1.4076950 4.490547e+01 28.92648458 cg05096415 44.9054669
## 14 44.2311676 3.504405e+00 25.25809202 cg23432430 44.2311676
## 15 3.0905298 4.200810e+01 26.68998266 cg17186592 42.0081020
## 16 15.8802196 4.167036e+01 10.43007825 cg00247094 41.6703641
## 17 41.4157285 6.518767e+00 18.54140543 cg09584650 41.4157285
## 18 24.2112262 4.137340e-03 40.49126516 cg11133939 40.4912652
## 19 39.1959650 7.696332e+00 17.04820793 cg16715186 39.1959650
## 20 12.4551007 3.861282e+01 8.42549585 cg03129555 38.6128234
## 21 3.1900587 2.010635e+01 38.48747637 cg08857872 38.4874764
## 22 12.1253718 3.682210e+01 11.12411712 cg06864789 36.8220973
## 23 0.0000000 3.530027e+01 26.72665764 cg14924512 35.3002711
## 24 7.2142317 1.187576e+01 34.92323965 cg16652920 34.9232396
## 25 19.1380264 3.459417e+01 0.00000000 cg03084184 34.5941651
## 26 3.6594506 1.335526e+01 34.15727920 cg26219488 34.1572792
## 27 13.4827036 3.380113e+01 6.05577323 cg20913114 33.8011320
## 28 7.1354711 3.347923e+01 11.82731988 cg06378561 33.4792338
## 29 33.3253288 1.548490e+01 2.09925955 cg26948066 33.3253288
## 30 0.5754957 3.329088e+01 17.46770718 cg25259265 33.2908817
## 31 33.2453952 0.000000e+00 21.54807871 cg06536614 33.2453952
## 32 1.6428181 3.231449e+01 17.24494950 cg24859648 32.3144876
## 33 12.7554337 3.077525e+01 2.20670041 cg12279734 30.7752546
## 34 30.6869432 1.116253e+01 2.48770629 cg03982462 30.6869432
## 35 1.2151951 3.061524e+01 16.61247988 cg05841700 30.6152440
## 36 29.8393321 7.666783e+00 7.71316484 cg11227702 29.8393321
## 37 25.3613752 0.000000e+00 29.01436336 cg12146221 29.0143634
## 38 9.6461764 8.950361e+00 28.93559238 cg02621446 28.9355924
## 39 0.0000000 2.259963e+01 28.82432888 cg00616572 28.8243289
## 40 28.4378228 8.978994e+00 6.54134607 cg15535896 28.4378228
## 41 25.4549752 0.000000e+00 28.22163976 cg02372404 28.2216398
## 42 5.0595151 2.778641e+01 8.14331041 cg09854620 27.7864115
## 43 27.6071611 0.000000e+00 15.87051880 cg04248279 27.6071611
## 44 4.0161409 7.689031e+00 27.54146620 cg20678988 27.5414662
## 45 0.0000000 2.751970e+01 13.83635369 cg24861747 27.5196952
## 46 27.4725483 1.566027e+01 0.00000000 cg10240127 27.4725483
## 47 7.7707423 7.230913e+00 27.21848262 cg16771215 27.2184826
## 48 0.6456766 2.697502e+01 14.65302759 cg01667144 26.9750157
## 49 26.9422636 8.953253e+00 2.80393480 cg13080267 26.9422636
## 50 0.0000000 2.616443e+01 26.57745234 cg02494911 26.5774523
## 51 9.3807400 2.645056e+01 5.12022061 cg10750306 26.4505591
## 52 25.4571262 1.207867e+00 11.26327346 cg11438323 25.4571262
## 53 4.8728912 4.039264e+00 25.43196817 cg06715136 25.4319682
## 54 25.1306218 0.000000e+00 15.37088902 cg04412904 25.1306218
## 55 4.7708807 2.485568e+01 5.40428961 cg12738248 24.8556786
## 56 24.4373049 0.000000e+00 18.64901449 cg03071582 24.4373049
## 57 0.0000000 2.430976e+01 15.78618979 cg05570109 24.3097592
## 58 24.2234675 2.027209e+01 0.00000000 cg15775217 24.2234675
## 59 0.0000000 1.993016e+01 24.20338222 cg24873924 24.2033822
## 60 7.5571475 4.145911e+00 24.12000164 cg17738613 24.1200016
## 61 23.8685802 0.000000e+00 20.76931879 cg01921484 23.8685802
## 62 0.0000000 1.628479e+01 23.70800855 cg10369879 23.7080086
## 63 0.0000000 1.838316e+01 23.65118269 cg27341708 23.6511827
## 64 0.0000000 2.355705e+01 21.42857702 cg12534577 23.5570536
## 65 0.0000000 2.343504e+01 17.81796855 cg18821122 23.4350422
## 66 4.6159402 6.920082e+00 23.35097600 cg12682323 23.3509760
## 67 23.3199397 0.000000e+00 14.18264753 cg05234269 23.3199397
## 68 23.0568545 0.000000e+00 22.77665933 cg20685672 23.0568545
## 69 20.3675386 0.000000e+00 22.86397213 cg12228670 22.8639721
## 70 22.7116735 3.671494e+00 8.32801266 cg11331837 22.7116735
## 71 0.0000000 2.268500e+01 20.87817301 cg01680303 22.6849966
## 72 22.4172909 1.166771e+00 10.22563277 cg17421046 22.4172909
## 73 22.2743617 1.928962e+01 0.00000000 cg00322003 22.2743617
## 74 22.2737985 8.049754e+00 2.25464756 cg03088219 22.2737985
## 75 22.2424977 1.528025e+01 0.00000000 cg02356645 22.2424977
## 76 5.8933810 2.207741e+01 1.26303407 cg01013522 22.0774149
## 77 12.6590030 0.000000e+00 21.77176163 cg00272795 21.7717616
## 78 21.6589655 0.000000e+00 14.53301798 cg25758034 21.6589655
## 79 4.7841837 2.163888e+01 1.18656219 cg26474732 21.6388832
## 80 0.0000000 2.126609e+01 17.64235223 cg16579946 21.2660881
## 81 9.5980250 2.121696e+01 0.00000000 cg07523188 21.2169601
## 82 21.2108554 4.532914e+00 5.64881948 cg11187460 21.2108554
## 83 0.0000000 1.703619e+01 20.80948044 cg14527649 20.8094804
## 84 2.7320807 4.853792e+00 20.53830769 cg20370184 20.5383077
## 85 20.5042022 0.000000e+00 13.74634533 cg17429539 20.5042022
## 86 0.0000000 2.029016e+01 10.01093515 cg20507276 20.2901584
## 87 1.1840742 6.815529e+00 20.19225461 cg13885788 20.1922546
## 88 0.0000000 1.556537e+01 20.08333284 cg16178271 20.0833328
## 89 5.5939181 1.529093e+00 19.98843387 cg10738648 19.9884339
## 90 5.1485052 1.992674e+01 2.76062708 cg26069044 19.9267407
## 91 3.2006638 4.954857e+00 19.79636965 cg25879395 19.7963697
## 92 19.6502331 0.000000e+00 12.11739199 cg06112204 19.6502331
## 93 3.2266078 1.921297e+01 1.25563874 cg23161429 19.2129728
## 94 19.0436333 0.000000e+00 8.86160146 cg25436480 19.0436333
## 95 18.8899765 1.898591e+01 0.00000000 cg26757229 18.9859061
## 96 18.8539813 8.150368e+00 0.00000000 cg02932958 18.8539813
## 97 6.3413123 1.862430e+01 0.95143518 cg18339359 18.6242952
## 98 18.5798880 1.513211e+00 1.88583048 cg06950937 18.5798880
## 99 12.0389141 1.857900e+01 0.00000000 cg23916408 18.5790048
## 100 1.5240724 3.185635e+00 18.16459540 cg12784167 18.1645954
## 101 11.9014906 0.000000e+00 18.13462538 cg07480176 18.1346254
## 102 0.0000000 5.493060e+00 17.69570496 cg15865722 17.6957050
## 103 17.6582944 0.000000e+00 13.07004017 cg27577781 17.6582944
## 104 17.1627270 2.947542e+00 2.52613791 cg05321907 17.1627270
## 105 16.8711800 0.000000e+00 7.57596539 cg03660162 16.8711800
## 106 16.7270034 0.000000e+00 9.94809152 cg07138269 16.7270034
## 107 16.7141446 8.370983e-04 5.48685893 cg20139683 16.7141446
## 108 1.5110047 1.660893e+01 3.59685475 cg12284872 16.6089331
## 109 16.5523009 0.000000e+00 15.31421809 cg03327352 16.5523009
## 110 0.0000000 1.652147e+01 12.90253736 cg23658987 16.5214740
## 111 0.0000000 1.473495e+01 16.18728682 cg21854924 16.1872868
## 112 15.7882584 0.000000e+00 6.82534076 cg21697769 15.7882584
## 113 15.6543993 5.763288e+00 0.00000000 cg19512141 15.6543993
## 114 10.3013242 0.000000e+00 15.49206958 cg08198851 15.4920696
## 115 0.4210073 1.508166e+01 0.82546402 cg00675157 15.0816601
## 116 0.0000000 5.704214e+00 15.02064792 cg01153376 15.0206479
## 117 1.8055061 1.496164e+01 0.76647969 cg01933473 14.9616407
## 118 14.8932694 0.000000e+00 4.58710848 cg12776173 14.8932694
## 119 0.0000000 1.065994e+01 14.72714332 cg14564293 14.7271433
## 120 12.4116879 0.000000e+00 14.56951596 cg24851651 14.5695160
## 121 0.0000000 1.452135e+01 2.25494783 cg22274273 14.5213516
## 122 12.7916783 1.451109e+01 0.00000000 cg25561557 14.5110857
## 123 13.7825866 1.440027e+01 0.00000000 cg21209485 14.4002713
## 124 3.9006296 1.430400e+01 0.00000000 cg10985055 14.3040000
## 125 8.0875881 0.000000e+00 14.25269895 cg14293999 14.2526989
## 126 0.0000000 6.075319e+00 13.99960727 cg18819889 13.9996073
## 127 7.9179369 1.389950e+01 0.00000000 cg24506579 13.8995029
## 128 10.4815163 0.000000e+00 13.82052426 cg19377607 13.8205243
## 129 2.6249344 1.359909e+01 0.00000000 cg06697310 13.5990934
## 130 13.5718494 0.000000e+00 10.16664025 cg00696044 13.5718494
## 131 0.0000000 0.000000e+00 13.10339546 cg01549082 13.1033955
## 132 0.0000000 6.890626e+00 13.07660092 cg01128042 13.0766009
## 133 0.2664728 1.247937e+01 1.15838593 cg00999469 12.4793749
## 134 0.0000000 1.077643e+01 12.38849837 cg06118351 12.3884984
## 135 0.0000000 1.124674e+01 11.78627370 cg12012426 11.7862737
## 136 11.7234564 9.459104e+00 0.00000000 cg08584917 11.7234564
## 137 0.0000000 1.167309e+01 2.25087645 cg15633912 11.6730940
## 138 11.6725674 0.000000e+00 11.20194712 cg27272246 11.6725674
## 139 11.3317090 1.972310e+00 0.00000000 cg17906851 11.3317090
## 140 1.1928989 1.133121e+01 0.00000000 cg16788319 11.3312105
## 141 8.9892259 0.000000e+00 11.29248054 cg07028768 11.2924805
## 142 0.0000000 3.124590e+00 10.74047353 cg27086157 10.7404735
## 143 1.7933150 9.609129e+00 0.00000000 cg14240646 9.6091292
## 144 0.0000000 9.463243e+00 9.19135560 cg00154902 9.4632430
## 145 6.6623080 0.000000e+00 9.11133270 cg14307563 9.1113327
## 146 0.0000000 8.531587e+00 0.00000000 cg02320265 8.5315872
## 147 8.2135222 0.000000e+00 7.03393427 cg08779649 8.2135222
## 148 7.6553807 0.000000e+00 7.95295636 cg04664583 7.9529564
## 149 0.0000000 0.000000e+00 6.58682703 cg12466610 6.5868270
## 150 6.2549228 3.707997e+00 0.00000000 cg27639199 6.2549228
## 151 0.0000000 0.000000e+00 5.80982245 cg15501526 5.8098225
## 152 0.0000000 4.840766e+00 3.67774019 cg00689685 4.8407663
## 153 2.7970199 0.000000e+00 0.08162381 cg01413796 2.7970199
## 154 0.0000000 0.000000e+00 2.13884039 cg11247378 2.1388404
## 155 0.5215083 0.000000e+00 0.63572687 age.now 0.6357269
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 90.421191 100.00000000 0.000000 PC1 100.00000
## 2 46.587888 78.76540001 0.000000 PC2 78.76540
## 3 5.926460 0.00000000 68.326452 PC3 68.32645
## 4 63.062214 11.84075872 36.931114 cg00962106 63.06221
## 5 23.012286 12.63505366 51.144013 cg02225060 51.14401
## 6 49.616892 8.38845390 25.395464 cg14710850 49.61689
## 7 49.059481 17.87521244 11.817731 cg27452255 49.05948
## 8 26.229039 5.64201829 49.026019 cg02981548 49.02602
## 9 48.679307 0.00000000 42.749050 cg08861434 48.67931
## 10 25.912349 48.11130126 5.779022 cg19503462 48.11130
## 11 27.983124 46.72741425 1.348501 cg07152869 46.72741
## 12 11.546189 17.95969178 45.937282 cg16749614 45.93728
## 13 1.407695 44.90546688 28.926485 cg05096415 44.90547
## 14 44.231168 3.50440539 25.258092 cg23432430 44.23117
## 15 3.090530 42.00810202 26.689983 cg17186592 42.00810
## 16 15.880220 41.67036405 10.430078 cg00247094 41.67036
## 17 41.415728 6.51876720 18.541405 cg09584650 41.41573
## 18 24.211226 0.00413734 40.491265 cg11133939 40.49127
## 19 39.195965 7.69633216 17.048208 cg16715186 39.19596
## 20 12.455101 38.61282340 8.425496 cg03129555 38.61282
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "PC3" "cg00962106" "cg02225060" "cg14710850" "cg27452255"
## [8] "cg02981548" "cg08861434" "cg19503462" "cg07152869" "cg16749614" "cg05096415" "cg23432430"
## [15] "cg17186592" "cg00247094" "cg09584650" "cg11133939" "cg16715186" "cg03129555"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
table(df_LRM1$DX)
##
## CN Dementia MCI
## 221 94 333
prop.table(table(df_LRM1$DX))
##
## CN Dementia MCI
## 0.3410494 0.1450617 0.5138889
table(trainData$DX)
##
## CN Dementia MCI
## 155 66 234
prop.table(table(trainData$DX))
##
## CN Dementia MCI
## 0.3406593 0.1450549 0.5142857
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 3.542553
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 3.545455Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 132.4, df = 2, p-value < 2.2e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 93.156, df = 2, p-value < 2.2e-16library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN Dementia MCI
## 155 132 234
dim(balanced_data_LGR_1)
## [1] 521 156
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 6 15
## Dementia 4 11 6
## MCI 17 11 78
##
## Overall Statistics
##
## Accuracy : 0.6943
## 95% CI : (0.6241, 0.7584)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 2.356e-07
##
## Kappa : 0.4779
##
## Mcnemar's Test P-Value : 0.5733
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.39286 0.7879
## Specificity 0.8346 0.93939 0.7021
## Pos Pred Value 0.6818 0.52381 0.7358
## Neg Pred Value 0.8346 0.90116 0.7586
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.05699 0.4041
## Detection Prevalence 0.3420 0.10881 0.5492
## Balanced Accuracy 0.7582 0.66613 0.7450
print(model_LRM2)
## glmnet
##
## 521 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 416, 417, 417, 417, 417
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.000186946 0.7103114 0.5552305
## 0.10 0.001869460 0.7121978 0.5563269
## 0.10 0.018694597 0.7160989 0.5621857
## 0.55 0.000186946 0.6987912 0.5369622
## 0.55 0.001869460 0.7102930 0.5525186
## 0.55 0.018694597 0.6872894 0.5142517
## 1.00 0.000186946 0.6834432 0.5136505
## 1.00 0.001869460 0.7045238 0.5443300
## 1.00 0.018694597 0.6468864 0.4489232
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0186946.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.958241758241758"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.6966484
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 80.688 100.000 0.000
## PC2 38.814 80.731 0.000
## cg00962106 56.202 9.099 33.493
## PC3 7.467 0.000 55.895
## cg19503462 26.324 48.654 6.540
## cg27452255 47.910 21.190 8.082
## cg07152869 27.972 45.992 1.294
## cg05096415 3.341 45.590 28.316
## cg02225060 18.264 12.784 45.588
## cg14710850 45.329 8.655 21.701
## cg02981548 23.095 5.927 45.307
## cg08861434 44.864 0.000 36.603
## cg03129555 14.463 42.033 10.566
## cg23432430 41.989 6.879 20.293
## cg16749614 8.920 17.010 41.732
## cg17186592 3.597 40.136 25.167
## cg14924512 1.857 38.982 23.218
## cg09584650 38.237 7.571 15.083
## cg06864789 13.550 38.081 11.898
## cg03084184 19.832 37.856 3.065
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5|| METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 80.688499148 100.00000000 0.000000000 PC1 100.0000000
## 2 38.814293894 80.73116981 0.000000000 PC2 80.7311698
## 3 56.201953611 9.09911968 33.493095025 cg00962106 56.2019536
## 4 7.467341807 0.00000000 55.895232595 PC3 55.8952326
## 5 26.324357187 48.65428391 6.540366672 cg19503462 48.6542839
## 6 47.909844955 21.19003754 8.081998635 cg27452255 47.9098450
## 7 27.972226763 45.99197183 1.294371256 cg07152869 45.9919718
## 8 3.340505452 45.58983812 28.315599957 cg05096415 45.5898381
## 9 18.264219950 12.78421147 45.587782522 cg02225060 45.5877825
## 10 45.329062512 8.65503354 21.701060361 cg14710850 45.3290625
## 11 23.094859080 5.92669780 45.307400928 cg02981548 45.3074009
## 12 44.864165165 0.00000000 36.602762852 cg08861434 44.8641652
## 13 14.463094695 42.03327794 10.566238507 cg03129555 42.0332779
## 14 41.989180142 6.87894102 20.292655519 cg23432430 41.9891801
## 15 8.920053612 17.00950243 41.731798290 cg16749614 41.7317983
## 16 3.597324390 40.13634499 25.166657777 cg17186592 40.1363450
## 17 1.857038695 38.98191970 23.217726210 cg14924512 38.9819197
## 18 38.237414837 7.57137035 15.082695906 cg09584650 38.2374148
## 19 13.550341036 38.08096223 11.897814383 cg06864789 38.0809622
## 20 19.832322635 37.85594401 3.065421777 cg03084184 37.8559440
## 21 21.501742479 0.51318561 37.528872407 cg11133939 37.5288724
## 22 13.600172125 37.19457507 9.115549215 cg00247094 37.1945751
## 23 0.542713010 20.67739292 35.721943012 cg08857872 35.7219430
## 24 35.486170490 7.96249429 14.042364819 cg16715186 35.4861705
## 25 4.935661334 35.04584177 17.439094906 cg24859648 35.0458418
## 26 14.089477135 34.55819164 5.437526844 cg12279734 34.5581916
## 27 1.732652241 34.10645155 18.443144849 cg25259265 34.1064516
## 28 8.424378469 34.06823359 11.647480632 cg06378561 34.0682336
## 29 2.315256778 13.35831728 31.978113840 cg26219488 31.9781138
## 30 12.472634453 31.59090814 5.775875747 cg20913114 31.5909081
## 31 5.489171457 11.24392869 31.377264132 cg16652920 31.3772641
## 32 1.405245869 30.97052414 17.379212292 cg05841700 30.9705241
## 33 29.676489142 14.07539866 0.805144625 cg26948066 29.6764891
## 34 28.737869928 12.28540079 0.030031567 cg03982462 28.7378699
## 35 28.258759627 8.09966055 6.642149391 cg11227702 28.2587596
## 36 6.460205795 28.05459552 8.137764089 cg09854620 28.0545955
## 37 27.468559856 0.00000000 21.547011800 cg06536614 27.4685599
## 38 7.547175154 9.69391892 27.097950316 cg02621446 27.0979503
## 39 0.000000000 27.00622824 24.128528762 cg02494911 27.0062282
## 40 20.457069989 0.00000000 26.624603334 cg12146221 26.6246033
## 41 0.000000000 25.79623162 26.592044153 cg00616572 26.5920442
## 42 9.536847794 26.42367491 5.644628072 cg10750306 26.4236749
## 43 26.167765420 7.87816983 6.038449805 cg15535896 26.1677654
## 44 1.140068212 25.93700297 13.651063404 cg01667144 25.9370030
## 45 0.000000000 25.63682735 13.470826797 cg24861747 25.6368274
## 46 25.542272183 15.10162869 0.000000000 cg10240127 25.5422722
## 47 24.118999307 0.00000000 25.125563097 cg02372404 25.1255631
## 48 1.111147195 8.19926972 25.064949566 cg06715136 25.0649496
## 49 24.852255463 0.00000000 16.124340115 cg20685672 24.8522555
## 50 0.000000000 24.78192605 14.617043174 cg05570109 24.7819260
## 51 24.731682044 0.00000000 13.462559109 cg04248279 24.7316820
## 52 4.046691734 5.49746420 24.336610598 cg20678988 24.3366106
## 53 0.000000000 24.20191047 18.411721155 cg12534577 24.2019105
## 54 0.000000000 24.13895507 15.855364668 cg16579946 24.1389551
## 55 4.826389136 24.12762377 5.718101058 cg12738248 24.1276238
## 56 6.529632908 5.92909446 24.064967745 cg16771215 24.0649677
## 57 24.017430734 10.17069127 0.028010156 cg13080267 24.0174307
## 58 5.507030815 5.66722760 23.062350711 cg17738613 23.0623507
## 59 22.325509240 6.54751969 5.652277581 cg11331837 22.3255092
## 60 0.000000000 22.28286919 17.240406275 cg01680303 22.2828692
## 61 22.209623590 0.00000000 13.211566870 cg04412904 22.2096236
## 62 0.000000000 22.09277799 14.934897153 cg18821122 22.0927780
## 63 3.418961026 7.32533965 22.057112379 cg12682323 22.0571124
## 64 22.043809258 16.24182372 0.000000000 cg02356645 22.0438093
## 65 0.000000000 20.82019047 22.036364160 cg24873924 22.0363642
## 66 0.000000000 15.80301510 22.028192936 cg10369879 22.0281929
## 67 6.482408270 21.72869906 0.939018125 cg01013522 21.7286991
## 68 16.476106166 0.00000000 21.606900872 cg12228670 21.6069009
## 69 7.504673650 21.12600379 0.000000000 cg07523188 21.1260038
## 70 21.105189244 18.08808609 0.000000000 cg15775217 21.1051892
## 71 21.025740955 0.00000000 16.866708890 cg03071582 21.0257410
## 72 20.948619901 0.00000000 12.120945430 cg05234269 20.9486199
## 73 0.000000000 20.90529433 7.902213775 cg20507276 20.9052943
## 74 0.000000000 19.09507877 20.828114561 cg27341708 20.8281146
## 75 13.177017420 20.44183263 0.000000000 cg25561557 20.4418326
## 76 20.440424332 8.86938790 0.348103979 cg03088219 20.4404243
## 77 20.431484829 0.00000000 19.510679122 cg01921484 20.4314848
## 78 4.713093446 20.19254710 4.205227539 cg26069044 20.1925471
## 79 20.140991688 0.00000000 7.542878492 cg06112204 20.1409917
## 80 20.087109684 0.00000000 10.293148792 cg25758034 20.0871097
## 81 20.072192022 0.22939651 9.403893500 cg17421046 20.0721920
## 82 19.725429611 0.00000000 12.789799128 cg11438323 19.7254296
## 83 19.701064237 0.00000000 9.922748112 cg17429539 19.7010642
## 84 19.537050438 14.85961436 0.000000000 cg00322003 19.5370504
## 85 19.326127572 4.15858089 4.744396470 cg11187460 19.3261276
## 86 2.515250593 5.41832059 18.975214454 cg25879395 18.9752145
## 87 4.058436074 18.84858496 0.228016041 cg26474732 18.8485850
## 88 2.892526937 18.77824046 2.420405667 cg23161429 18.7782405
## 89 1.683266081 4.78596818 18.693202335 cg20370184 18.6932023
## 90 18.637545772 0.02146389 6.333570297 cg25436480 18.6375458
## 91 0.009452251 7.64519394 18.618965539 cg13885788 18.6189655
## 92 11.433741314 18.26499711 0.000000000 cg23916408 18.2649971
## 93 0.000000000 16.67198659 18.165974740 cg14527649 18.1659747
## 94 5.005436113 1.01254421 18.050518899 cg10738648 18.0505189
## 95 0.000000000 17.96896566 12.783854784 cg23658987 17.9689657
## 96 5.986548344 17.93929168 1.282747947 cg18339359 17.9392917
## 97 10.255499373 0.00000000 17.836580203 cg07480176 17.8365802
## 98 16.796579637 17.79312862 0.000000000 cg26757229 17.7931286
## 99 2.972810732 17.77785615 4.058961200 cg12284872 17.7778562
## 100 8.056052971 17.46817747 0.000000000 cg24506579 17.4681775
## 101 17.452539976 8.51228551 0.000000000 cg02932958 17.4525400
## 102 13.352445749 0.00000000 17.317905474 cg00272795 17.3179055
## 103 0.000000000 7.44221308 17.200532252 cg12784167 17.2005323
## 104 16.764866307 0.00000000 6.646108759 cg03660162 16.7648663
## 105 0.000000000 16.00794740 16.457904409 cg16178271 16.4579044
## 106 16.352599342 0.00000000 11.995443315 cg27577781 16.3525993
## 107 16.126430414 0.00000000 8.289563199 cg07138269 16.1264304
## 108 15.974517734 2.87942381 2.065851003 cg05321907 15.9745177
## 109 0.755584018 15.69076519 2.146573132 cg22274273 15.6907652
## 110 0.467255566 3.15546065 15.548529013 cg15865722 15.5485290
## 111 13.414416608 15.52962619 0.000000000 cg21209485 15.5296262
## 112 15.467767202 0.63649811 3.699141098 cg20139683 15.4677672
## 113 0.807255447 15.27246532 2.248226332 cg15633912 15.2724653
## 114 1.773934188 15.19614251 0.493410601 cg00675157 15.1961425
## 115 0.000000000 15.00431453 13.731857733 cg21854924 15.0043145
## 116 0.000000000 8.28396704 14.990763999 cg14564293 14.9907640
## 117 1.419752037 14.68128295 1.625955063 cg01933473 14.6812830
## 118 14.363558658 0.00000000 2.334430859 cg06950937 14.3635587
## 119 7.039607983 0.00000000 14.270472500 cg14293999 14.2704725
## 120 0.000000000 7.61428358 14.108315063 cg01128042 14.1083151
## 121 13.963221888 0.00000000 13.899173399 cg03327352 13.9632219
## 122 13.956539489 0.00000000 2.029774778 cg12776173 13.9565395
## 123 8.337809729 0.00000000 13.927313821 cg24851651 13.9273138
## 124 8.522498604 0.00000000 13.705406963 cg19377607 13.7054070
## 125 13.702247521 0.00000000 7.328782945 cg00696044 13.7022475
## 126 0.000000000 2.81157158 13.614359374 cg01153376 13.6143594
## 127 13.568528499 3.88094848 0.000000000 cg19512141 13.5685285
## 128 0.000000000 6.29526893 13.552242319 cg18819889 13.5522423
## 129 8.863676187 0.00000000 13.138149457 cg27272246 13.1381495
## 130 12.206523053 0.00000000 13.006438244 cg08198851 13.0064382
## 131 0.000000000 9.80650980 12.678148985 cg06118351 12.6781490
## 132 4.065417376 12.40020977 0.000000000 cg10985055 12.4002098
## 133 0.922128548 11.76022383 0.005104054 cg16788319 11.7602238
## 134 1.039323446 11.74071914 0.000000000 cg14240646 11.7407191
## 135 0.791497817 11.56711305 0.394794257 cg00999469 11.5671130
## 136 0.000000000 11.34388307 10.937086239 cg12012426 11.3438831
## 137 0.000000000 2.69796677 10.877759543 cg01549082 10.8777595
## 138 10.745029758 0.00000000 9.148608471 cg21697769 10.7450298
## 139 10.662680862 0.00000000 7.596063644 cg07028768 10.6626809
## 140 10.319768459 3.95272245 0.000000000 cg17906851 10.3197685
## 141 0.000000000 8.38289397 9.799519490 cg27086157 9.7995195
## 142 0.310175184 9.75654690 0.000000000 cg06697310 9.7565469
## 143 9.736065016 9.24207711 0.000000000 cg08584917 9.7360650
## 144 0.606268787 9.52991966 0.000000000 cg02320265 9.5299197
## 145 2.501266350 0.00000000 9.497362665 cg04664583 9.4973627
## 146 4.880778567 0.00000000 8.724030662 cg14307563 8.7240307
## 147 6.240259640 0.00000000 8.437061685 cg08779649 8.4370617
## 148 0.000000000 6.06486468 7.341051542 cg00154902 7.3410515
## 149 0.000000000 0.00000000 6.380539190 cg12466610 6.3805392
## 150 6.357489751 4.09173156 0.000000000 cg27639199 6.3574898
## 151 0.000000000 5.86104601 4.818679307 cg00689685 5.8610460
## 152 0.000000000 2.98997653 5.167419744 cg15501526 5.1674197
## 153 2.833682181 0.00000000 0.000000000 cg01413796 2.8336822
## 154 0.421065431 0.00000000 0.566138305 age.now 0.5661383
## 155 0.000000000 0.43502977 0.045856942 cg11247378 0.4350298
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 80.688499 100.000000 0.000000 PC1 100.00000
## 2 38.814294 80.731170 0.000000 PC2 80.73117
## 3 56.201954 9.099120 33.493095 cg00962106 56.20195
## 4 7.467342 0.000000 55.895233 PC3 55.89523
## 5 26.324357 48.654284 6.540367 cg19503462 48.65428
## 6 47.909845 21.190038 8.081999 cg27452255 47.90984
## 7 27.972227 45.991972 1.294371 cg07152869 45.99197
## 8 3.340505 45.589838 28.315600 cg05096415 45.58984
## 9 18.264220 12.784211 45.587783 cg02225060 45.58778
## 10 45.329063 8.655034 21.701060 cg14710850 45.32906
## 11 23.094859 5.926698 45.307401 cg02981548 45.30740
## 12 44.864165 0.000000 36.602763 cg08861434 44.86417
## 13 14.463095 42.033278 10.566239 cg03129555 42.03328
## 14 41.989180 6.878941 20.292656 cg23432430 41.98918
## 15 8.920054 17.009502 41.731798 cg16749614 41.73180
## 16 3.597324 40.136345 25.166658 cg17186592 40.13634
## 17 1.857039 38.981920 23.217726 cg14924512 38.98192
## 18 38.237415 7.571370 15.082696 cg09584650 38.23741
## 19 13.550341 38.080962 11.897814 cg06864789 38.08096
## 20 19.832323 37.855944 3.065422 cg03084184 37.85594
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "PC3" "cg19503462" "cg27452255" "cg07152869"
## [8] "cg05096415" "cg02225060" "cg14710850" "cg02981548" "cg08861434" "cg03129555" "cg23432430"
## [15] "cg16749614" "cg17186592" "cg14924512" "cg09584650" "cg06864789" "cg03084184"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8505
## The AUC value for class CN is: 0.850513
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8357
## The AUC value for class Dementia is: 0.8357143
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8188
## The AUC value for class MCI is: 0.8188266
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
## The mean AUC value across all classes with one versus rest method is: 0.835018
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.6571736 0.42345797
## 0 0.05357895 0.6725349 0.43439423
## 0 0.10615789 0.6747338 0.43094148
## 0 0.15873684 0.6725599 0.42391171
## 0 0.21131579 0.6725837 0.41818370
## 0 0.26389474 0.6770526 0.42406079
## 0 0.31647368 0.6769804 0.41856449
## 0 0.36905263 0.6726087 0.40853473
## 0 0.42163158 0.6638170 0.38542265
## 0 0.47421053 0.6660148 0.38902178
## 0 0.52678947 0.6594214 0.37628816
## 0 0.57936842 0.6550252 0.36510400
## 0 0.63194737 0.6528274 0.35927177
## 0 0.68452632 0.6418618 0.33471759
## 0 0.73710526 0.6352200 0.31832804
## 0 0.78968421 0.6307756 0.30720022
## 0 0.84226316 0.6263800 0.29777058
## 0 0.89484211 0.6220322 0.28739881
## 0 0.94742105 0.6220322 0.28739881
## 0 1.00000000 0.6220322 0.28682520
## 1 0.00100000 0.6240596 0.37352512
## 1 0.05357895 0.5187546 0.05457313
## 1 0.10615789 0.5142862 0.00000000
## 1 0.15873684 0.5142862 0.00000000
## 1 0.21131579 0.5142862 0.00000000
## 1 0.26389474 0.5142862 0.00000000
## 1 0.31647368 0.5142862 0.00000000
## 1 0.36905263 0.5142862 0.00000000
## 1 0.42163158 0.5142862 0.00000000
## 1 0.47421053 0.5142862 0.00000000
## 1 0.52678947 0.5142862 0.00000000
## 1 0.57936842 0.5142862 0.00000000
## 1 0.63194737 0.5142862 0.00000000
## 1 0.68452632 0.5142862 0.00000000
## 1 0.73710526 0.5142862 0.00000000
## 1 0.78968421 0.5142862 0.00000000
## 1 0.84226316 0.5142862 0.00000000
## 1 0.89484211 0.5142862 0.00000000
## 1 0.94742105 0.5142862 0.00000000
## 1 1.00000000 0.5142862 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2638947.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.5868408
FeatEval_Freq_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Freq_mean_accuracy_cv_ENM1)
## [1] 0.5868408
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Freq_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.863736263736264"
print(FeatEval_Freq_ENM1_trainAccuracy)
## [1] 0.8637363
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Freq_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Freq_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 45 5 13
## Dementia 0 8 0
## MCI 21 15 86
##
## Overall Statistics
##
## Accuracy : 0.7202
## 95% CI : (0.6512, 0.7823)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 3.473e-09
##
## Kappa : 0.4987
##
## Mcnemar's Test P-Value : 6.901e-05
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6818 0.28571 0.8687
## Specificity 0.8583 1.00000 0.6170
## Pos Pred Value 0.7143 1.00000 0.7049
## Neg Pred Value 0.8385 0.89189 0.8169
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2332 0.04145 0.4456
## Detection Prevalence 0.3264 0.04145 0.6321
## Balanced Accuracy 0.7700 0.64286 0.7429
cm_FeatEval_Freq_ENM1_Accuracy<-cm_FeatEval_Freq_ENM1$overall["Accuracy"]
cm_FeatEval_Freq_ENM1_Kappa<-cm_FeatEval_Freq_ENM1$overall["Kappa"]
print(cm_FeatEval_Freq_ENM1_Accuracy)
## Accuracy
## 0.7202073
print(cm_FeatEval_Freq_ENM1_Kappa)
## Kappa
## 0.4986772
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## PC1 86.62 100.000 13.316
## PC2 68.41 88.605 20.130
## cg00962106 72.97 12.359 60.543
## cg02225060 43.13 18.828 62.025
## cg02981548 49.97 8.974 59.004
## cg23432430 57.29 15.760 41.467
## cg14710850 54.50 8.363 46.076
## cg16749614 20.68 33.684 54.424
## cg07152869 48.29 54.289 5.937
## cg08857872 29.00 24.415 53.478
## cg16652920 27.04 25.381 52.480
## cg26948066 51.16 42.093 9.006
## PC3 12.11 38.679 50.851
## cg08861434 48.60 1.032 49.700
## cg27452255 49.50 29.762 19.675
## cg09584650 48.11 20.546 27.504
## cg11133939 31.92 15.802 47.781
## cg19503462 47.24 44.926 2.252
## cg06864789 20.57 46.479 25.849
## cg02372404 30.74 14.684 45.487
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 86.62050902 1.000000e+02 13.3160630 PC1 100.0000000
## 2 68.41216879 8.860540e+01 20.1298078 PC2 88.6054047
## 3 72.96575230 1.235886e+01 60.5434683 cg00962106 72.9657523
## 4 43.13322979 1.882820e+01 62.0248574 cg02225060 62.0248574
## 5 49.96585084 8.974481e+00 59.0037601 cg02981548 59.0037601
## 6 57.29080134 1.576006e+01 41.4673182 cg23432430 57.2908013
## 7 54.50228710 8.362895e+00 46.0759636 cg14710850 54.5022871
## 8 20.67688605 3.368387e+01 54.4241861 cg16749614 54.4241861
## 9 48.28878775 5.428948e+01 5.9372679 cg07152869 54.2894836
## 10 28.99951869 2.441523e+01 53.4781726 cg08857872 53.4781726
## 11 27.03639026 2.538054e+01 52.4803598 cg16652920 52.4803598
## 12 51.16229060 4.209275e+01 9.0061160 cg26948066 51.1622906
## 13 12.10910616 3.867890e+01 50.8514321 PC3 50.8514321
## 14 48.60433053 1.032191e+00 49.6999500 cg08861434 49.6999500
## 15 49.49968697 2.976165e+01 19.6746118 cg27452255 49.4996870
## 16 48.11340007 2.054619e+01 27.5037772 cg09584650 48.1134001
## 17 31.91638249 1.580156e+01 47.7813735 cg11133939 47.7813735
## 18 47.24092693 4.492567e+01 2.2518257 cg19503462 47.2409269
## 19 20.56659243 4.647858e+01 25.8485628 cg06864789 46.4785832
## 20 30.73957153 1.468363e+01 45.4866296 cg02372404 45.4866296
## 21 13.69533098 4.531490e+01 31.5561459 cg24859648 45.3149049
## 22 10.38181326 3.472108e+01 45.1663236 cg14527649 45.1663236
## 23 44.71184790 3.266512e+01 11.9832982 cg03982462 44.7118479
## 24 43.77940241 1.498799e+01 28.7279844 cg06536614 43.7794024
## 25 0.05859366 4.329987e+01 43.1778488 cg17186592 43.2998704
## 26 26.35094279 1.675569e+01 43.1700654 cg26219488 43.1700654
## 27 42.96455132 1.408192e+01 28.8191985 cg10240127 42.9645513
## 28 13.43785370 4.289988e+01 29.3985988 cg00247094 42.8998806
## 29 35.47328840 6.861122e+00 42.3978384 cg20685672 42.3978384
## 30 3.59329081 4.215453e+01 38.4978073 cg25259265 42.1545261
## 31 42.14098876 1.425796e+01 27.8195995 cg16715186 42.1409888
## 32 0.72641363 4.194272e+01 41.1528826 cg05096415 41.9427243
## 33 34.83565307 4.176144e+01 6.8623616 cg15775217 41.7614426
## 34 15.96506670 4.058693e+01 24.5584393 cg24861747 40.5869340
## 35 34.02724408 6.215796e+00 40.3064684 cg07028768 40.3064684
## 36 4.43188361 3.973411e+01 35.2388010 cg14924512 39.7341126
## 37 24.98036059 3.964332e+01 14.5995338 cg03084184 39.6433224
## 38 4.46860104 3.906794e+01 34.5359093 cg05570109 39.0679383
## 39 34.88126397 3.997365e+00 38.9420569 cg01921484 38.9420569
## 40 9.76423401 2.779011e+01 37.6177708 cg00154902 37.6177708
## 41 28.32432504 3.744025e+01 9.0524945 cg26757229 37.4402476
## 42 37.35444255 9.845102e+00 27.4459127 cg03660162 37.3544426
## 43 35.88197667 5.225476e-01 36.4679523 cg12228670 36.4679523
## 44 4.42287463 3.174001e+01 36.2263084 cg00616572 36.2263084
## 45 14.12090749 3.616528e+01 21.9809420 cg20507276 36.1652775
## 46 5.45819324 3.544696e+01 29.9253415 cg05841700 35.4469628
## 47 21.86551281 1.351449e+01 35.4434316 cg06715136 35.4434316
## 48 22.83649675 1.227241e+01 35.1723309 cg02621446 35.1723309
## 49 18.36622546 3.502290e+01 16.5932465 cg12738248 35.0228999
## 50 14.22686588 3.493731e+01 20.6470141 cg09854620 34.9373080
## 51 32.22108254 3.481801e+01 2.5335040 cg00322003 34.8180145
## 52 8.08392092 2.660860e+01 34.7559518 cg24873924 34.7559518
## 53 14.17904774 3.469767e+01 20.4551950 cg03129555 34.6976707
## 54 34.67519870 7.587119e+00 27.0246513 cg04412904 34.6751987
## 55 15.01097938 1.956984e+01 34.6442427 cg17738613 34.6442427
## 56 18.92309050 1.558852e+01 34.5750392 cg25879395 34.5750392
## 57 34.34052148 1.088586e+01 23.3912285 cg05234269 34.3405215
## 58 22.74814328 3.407060e+01 11.2590310 cg20913114 34.0706023
## 59 1.10432730 3.256996e+01 33.7377127 cg02494911 33.7377127
## 60 17.46539414 3.350897e+01 15.9801525 cg00675157 33.5089746
## 61 26.90531215 3.346397e+01 6.4952294 cg12279734 33.4639696
## 62 12.81006898 2.054691e+01 33.4204064 cg01153376 33.4204064
## 63 30.29072569 2.969637e+00 33.3237905 cg04248279 33.3237905
## 64 30.63924812 3.320584e+01 2.5031614 cg06697310 33.2058375
## 65 25.57263487 3.289020e+01 7.2541345 cg26474732 32.8901974
## 66 19.20126650 1.362518e+01 32.8898776 cg16771215 32.8898776
## 67 1.21419254 3.269657e+01 31.4189519 cg12534577 32.6965725
## 68 14.55277128 3.243786e+01 17.8216650 cg06378561 32.4378643
## 69 19.19190875 1.316032e+01 32.4156554 cg18819889 32.4156554
## 70 29.77425177 3.221985e+01 2.3821664 cg01013522 32.2198462
## 71 8.93886565 2.321185e+01 32.2141388 cg10369879 32.2141388
## 72 31.33704403 9.314914e+00 21.9587019 cg03327352 31.3370440
## 73 31.30160654 8.696956e+00 22.5412221 cg07138269 31.3016065
## 74 30.27943029 7.143411e-01 31.0571995 cg12146221 31.0571995
## 75 31.01419234 1.154350e+01 19.4072654 cg11227702 31.0141923
## 76 30.51179268 2.044969e-01 30.7797176 cg27577781 30.7797176
## 77 30.73490550 2.929819e+01 1.3732876 cg02356645 30.7349055
## 78 10.88639219 1.960561e+01 30.5554321 cg15865722 30.5554321
## 79 21.12443442 3.052442e+01 9.3365531 cg18339359 30.5244155
## 80 21.72330090 3.049950e+01 8.7127705 cg08584917 30.4994994
## 81 30.48083627 1.623341e+01 14.1840006 cg15535896 30.4808363
## 82 9.34548938 3.034926e+01 20.9403427 cg01680303 30.3492601
## 83 0.66026098 2.956826e+01 30.2919448 cg01667144 30.2919448
## 84 17.55646953 2.993258e+01 12.3126775 cg07523188 29.9325751
## 85 12.72027022 1.708320e+01 29.8669009 cg21854924 29.8669009
## 86 9.99154586 2.974237e+01 19.6873945 cg10750306 29.7423684
## 87 5.72162167 2.961493e+01 23.8298800 cg16579946 29.6149297
## 88 29.45266239 5.868392e+00 23.5208426 cg11438323 29.4526624
## 89 7.90101220 2.936462e+01 21.4001830 cg18821122 29.3646232
## 90 13.46830088 1.551925e+01 29.0509798 cg01128042 29.0509798
## 91 12.43894239 1.650670e+01 29.0090673 cg14564293 29.0090673
## 92 28.69826490 4.398695e-01 28.1949674 cg08198851 28.6982649
## 93 25.92092461 2.699178e+00 28.6835305 cg00696044 28.6835305
## 94 28.64621193 7.486104e+00 21.0966801 cg17421046 28.6462119
## 95 28.22427509 1.423333e+01 13.9275209 cg11331837 28.2242751
## 96 4.57983333 2.318275e+01 27.8260122 cg12682323 27.8260122
## 97 27.75391376 2.314589e+01 4.5445933 cg02932958 27.7539138
## 98 2.22968717 2.770318e+01 25.4100660 cg23658987 27.7031812
## 99 13.54188950 1.406081e+01 27.6661275 cg07480176 27.6661275
## 100 18.99323166 8.562681e+00 27.6193403 cg10738648 27.6193403
## 101 23.23802196 4.225899e+00 27.5273491 cg03071582 27.5273491
## 102 27.50544612 1.371725e+01 13.7247673 cg25758034 27.5054461
## 103 8.31672910 1.850480e+01 26.8849556 cg06118351 26.8849556
## 104 26.47262314 2.668285e+01 0.1468021 cg19512141 26.6828533
## 105 15.77531966 2.662511e+01 10.7863633 cg23161429 26.6251110
## 106 13.98131160 2.639430e+01 12.3495607 cg11247378 26.3943003
## 107 18.59047099 7.683583e+00 26.3374815 cg20678988 26.3374815
## 108 14.37104595 1.154409e+01 25.9785682 cg27086157 25.9785682
## 109 25.84351166 9.775209e+00 16.0048742 cg03088219 25.8435117
## 110 13.62701522 2.527486e+01 11.5844190 cg22274273 25.2748622
## 111 2.73162960 2.236009e+01 25.1551464 cg13885788 25.1551464
## 112 7.96956513 1.668287e+01 24.7158621 cg14240646 24.7158621
## 113 23.64352178 7.872467e-01 24.4941965 cg06112204 24.4941965
## 114 24.37778097 4.912532e+00 19.4018209 cg17429539 24.3777810
## 115 23.05300756 2.435067e+01 1.2342388 cg25561557 24.3506743
## 116 21.11637573 3.134858e+00 24.3146618 cg14293999 24.3146618
## 117 15.52345785 8.640339e+00 24.2272249 cg19377607 24.2272249
## 118 21.13573933 2.410962e+01 2.9104517 cg06950937 24.1096190
## 119 24.09497385 4.091576e+00 19.9399703 cg25436480 24.0949738
## 120 14.61521652 9.016215e+00 23.6948594 cg00272795 23.6948594
## 121 10.00915361 1.338532e+01 23.4578980 cg12012426 23.4578980
## 122 23.37900124 1.718161e+01 6.1339628 cg05321907 23.3790012
## 123 23.15395979 9.972972e+00 13.1175593 cg20139683 23.1539598
## 124 0.72092593 2.312580e+01 22.3414416 cg26069044 23.1257956
## 125 21.02472207 2.241615e+01 1.3279969 cg23916408 22.4161470
## 126 0.60251447 2.222861e+01 21.5626641 cg27341708 22.2286066
## 127 15.97348851 2.221336e+01 6.1764438 cg13080267 22.2133604
## 128 21.86060829 1.296764e+00 20.5004165 cg27272246 21.8606083
## 129 0.95815549 2.184265e+01 20.8210630 cg12284872 21.8426466
## 130 2.40865385 2.169918e+01 19.2270978 cg00689685 21.6991797
## 131 2.01195112 2.152617e+01 19.4507897 cg16178271 21.5261688
## 132 21.27759398 8.124505e+00 13.0896611 cg21209485 21.2775940
## 133 20.58951338 1.059009e+01 9.9359937 cg24851651 20.5895134
## 134 20.33617165 7.329384e+00 12.9433597 cg21697769 20.3361716
## 135 20.32987702 6.214241e+00 14.0522083 cg04664583 20.3298770
## 136 14.63862353 1.993277e+01 5.2307158 cg00999469 19.9327674
## 137 2.27018549 1.742740e+01 19.7610183 cg20370184 19.7610183
## 138 18.97878439 4.183644e+00 14.7317123 cg11187460 18.9787844
## 139 18.43650302 1.998897e+00 16.3741776 cg12784167 18.4365030
## 140 1.20148356 1.698306e+01 18.2479666 cg02320265 18.2479666
## 141 17.49071043 1.357646e+01 3.8508273 cg12776173 17.4907104
## 142 17.27589672 1.271058e+00 15.9414108 cg08779649 17.2758967
## 143 8.18262162 8.988192e+00 17.2342417 cg01933473 17.2342417
## 144 17.18150883 8.948544e+00 8.1695367 cg15501526 17.1815088
## 145 13.77131645 1.693226e+01 3.0975134 cg10985055 16.9322578
## 146 16.16400549 6.749876e+00 9.3507013 cg17906851 16.1640055
## 147 11.29843847 4.708384e+00 16.0702505 cg14307563 16.0702505
## 148 4.33311418 1.431148e+01 9.9149370 cg16788319 14.3114792
## 149 11.34762178 1.384129e+01 2.4302450 cg24506579 13.8412948
## 150 9.52142351 1.242079e+01 2.8359369 cg27639199 12.4207884
## 151 1.91220728 1.029544e+01 12.2710740 cg12466610 12.2710740
## 152 9.00483857 2.188811e+00 11.2570774 cg15633912 11.2570774
## 153 0.00000000 1.116831e+01 11.2317426 cg01413796 11.2317426
## 154 1.45779360 1.885462e-01 1.7097678 cg01549082 1.7097678
## 155 0.70732781 5.928875e-03 0.7766847 age.now 0.7766847
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 86.62051 100.000000 13.316063 PC1 100.00000
## 2 68.41217 88.605405 20.129808 PC2 88.60540
## 3 72.96575 12.358856 60.543468 cg00962106 72.96575
## 4 43.13323 18.828200 62.024857 cg02225060 62.02486
## 5 49.96585 8.974481 59.003760 cg02981548 59.00376
## 6 57.29080 15.760055 41.467318 cg23432430 57.29080
## 7 54.50229 8.362895 46.075964 cg14710850 54.50229
## 8 20.67689 33.683872 54.424186 cg16749614 54.42419
## 9 48.28879 54.289484 5.937268 cg07152869 54.28948
## 10 28.99952 24.415226 53.478173 cg08857872 53.47817
## 11 27.03639 25.380542 52.480360 cg16652920 52.48036
## 12 51.16229 42.092747 9.006116 cg26948066 51.16229
## 13 12.10911 38.678898 50.851432 PC3 50.85143
## 14 48.60433 1.032191 49.699950 cg08861434 49.69995
## 15 49.49969 29.761647 19.674612 cg27452255 49.49969
## 16 48.11340 20.546195 27.503777 cg09584650 48.11340
## 17 31.91638 15.801563 47.781374 cg11133939 47.78137
## 18 47.24093 44.925673 2.251826 cg19503462 47.24093
## 19 20.56659 46.478583 25.848563 cg06864789 46.47858
## 20 30.73957 14.683630 45.486630 cg02372404 45.48663
## [1] "the top 20 features based on max way:"
## [1] "PC1" "PC2" "cg00962106" "cg02225060" "cg02981548" "cg23432430" "cg14710850"
## [8] "cg16749614" "cg07152869" "cg08857872" "cg16652920" "cg26948066" "PC3" "cg08861434"
## [15] "cg27452255" "cg09584650" "cg11133939" "cg19503462" "cg06864789" "cg02372404"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.8682
## The AUC value for class CN is: 0.8681699
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.8656
## The AUC value for class Dementia is: 0.8655844
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.8361
## The AUC value for class MCI is: 0.8361272
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_ENM1_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.8566272
print(FeatEval_Freq_ENM1_AUC)
## [1] 0.8566272
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.5626390 0.2014997
## 0.3 1 0.6 0.50 100 0.5624696 0.2158945
## 0.3 1 0.6 0.50 150 0.5469873 0.1948979
## 0.3 1 0.6 0.75 50 0.5274964 0.1270203
## 0.3 1 0.6 0.75 100 0.5715756 0.2190828
## 0.3 1 0.6 0.75 150 0.5693539 0.2286734
## 0.3 1 0.6 1.00 50 0.5210023 0.1059047
## 0.3 1 0.6 1.00 100 0.5452243 0.1644825
## 0.3 1 0.6 1.00 150 0.5540405 0.1910353
## 0.3 1 0.8 0.50 50 0.5648851 0.1998096
## 0.3 1 0.8 0.50 100 0.5999310 0.2842385
## 0.3 1 0.8 0.50 150 0.5956554 0.2794328
## 0.3 1 0.8 0.75 50 0.5430520 0.1516979
## 0.3 1 0.8 0.75 100 0.5649578 0.2061207
## 0.3 1 0.8 0.75 150 0.5605372 0.2043284
## 0.3 1 0.8 1.00 50 0.5232468 0.1074828
## 0.3 1 0.8 1.00 100 0.5474954 0.1693409
## 0.3 1 0.8 1.00 150 0.5737490 0.2274695
## 0.3 2 0.6 0.50 50 0.5407788 0.1714629
## 0.3 2 0.6 0.50 100 0.5407554 0.1718381
## 0.3 2 0.6 0.50 150 0.5495472 0.1892652
## 0.3 2 0.6 0.75 50 0.5648134 0.2046993
## 0.3 2 0.6 0.75 100 0.5913569 0.2486865
## 0.3 2 0.6 0.75 150 0.5847879 0.2443570
## 0.3 2 0.6 1.00 50 0.5363853 0.1443827
## 0.3 2 0.6 1.00 100 0.5628327 0.1953351
## 0.3 2 0.6 1.00 150 0.5826623 0.2366868
## 0.3 2 0.8 0.50 50 0.5890614 0.2501737
## 0.3 2 0.8 0.50 100 0.5824935 0.2452226
## 0.3 2 0.8 0.50 150 0.5912842 0.2616974
## 0.3 2 0.8 0.75 50 0.5692573 0.1988720
## 0.3 2 0.8 0.75 100 0.5693051 0.2036385
## 0.3 2 0.8 0.75 150 0.5736529 0.2183813
## 0.3 2 0.8 1.00 50 0.5386065 0.1523143
## 0.3 2 0.8 1.00 100 0.5386542 0.1511371
## 0.3 2 0.8 1.00 150 0.5694261 0.2109708
## 0.3 3 0.6 0.50 50 0.5668912 0.2142216
## 0.3 3 0.6 0.50 100 0.5999320 0.2739081
## 0.3 3 0.6 0.50 150 0.6000287 0.2767759
## 0.3 3 0.6 0.75 50 0.5827101 0.2269304
## 0.3 3 0.6 0.75 100 0.5893035 0.2437535
## 0.3 3 0.6 0.75 150 0.5870569 0.2464742
## 0.3 3 0.6 1.00 50 0.5626390 0.1938117
## 0.3 3 0.6 1.00 100 0.5603440 0.1981710
## 0.3 3 0.6 1.00 150 0.5648373 0.2066440
## 0.3 3 0.8 0.50 50 0.5651516 0.2022861
## 0.3 3 0.8 0.50 100 0.5825174 0.2372226
## 0.3 3 0.8 0.50 150 0.5913086 0.2593325
## 0.3 3 0.8 0.75 50 0.5735802 0.2132218
## 0.3 3 0.8 0.75 100 0.5779763 0.2221561
## 0.3 3 0.8 0.75 150 0.5890136 0.2432604
## 0.3 3 0.8 1.00 50 0.5627600 0.1925487
## 0.3 3 0.8 1.00 100 0.5626140 0.1938372
## 0.3 3 0.8 1.00 150 0.5648851 0.2019155
## 0.4 1 0.6 0.50 50 0.5518193 0.1871534
## 0.4 1 0.6 0.50 100 0.5606355 0.2125577
## 0.4 1 0.6 0.50 150 0.5759473 0.2497633
## 0.4 1 0.6 0.75 50 0.5407804 0.1628039
## 0.4 1 0.6 0.75 100 0.5496443 0.1941811
## 0.4 1 0.6 0.75 150 0.5805118 0.2549604
## 0.4 1 0.6 1.00 50 0.5474237 0.1603859
## 0.4 1 0.6 1.00 100 0.5628078 0.2050572
## 0.4 1 0.6 1.00 150 0.5693284 0.2210329
## 0.4 1 0.8 0.50 50 0.5497409 0.1949540
## 0.4 1 0.8 0.50 100 0.5542093 0.2109399
## 0.4 1 0.8 0.50 150 0.5694739 0.2409194
## 0.4 1 0.8 0.75 50 0.5607315 0.1935090
## 0.4 1 0.8 0.75 100 0.5605855 0.2048830
## 0.4 1 0.8 0.75 150 0.5716728 0.2312368
## 0.4 1 0.8 1.00 50 0.5430514 0.1517067
## 0.4 1 0.8 1.00 100 0.5694012 0.2179684
## 0.4 1 0.8 1.00 150 0.5781929 0.2415106
## 0.4 2 0.6 0.50 50 0.5627106 0.2286323
## 0.4 2 0.6 0.50 100 0.5474226 0.1999111
## 0.4 2 0.6 0.50 150 0.5561671 0.2123446
## 0.4 2 0.6 0.75 50 0.5604656 0.2094602
## 0.4 2 0.6 0.75 100 0.5628078 0.2120169
## 0.4 2 0.6 0.75 150 0.5628333 0.2107193
## 0.4 2 0.6 1.00 50 0.5627350 0.2019805
## 0.4 2 0.6 1.00 100 0.5912831 0.2535422
## 0.4 2 0.6 1.00 150 0.5693529 0.2161640
## 0.4 2 0.8 0.50 50 0.5736035 0.2274106
## 0.4 2 0.8 0.50 100 0.5781680 0.2491219
## 0.4 2 0.8 0.50 150 0.5781202 0.2492017
## 0.4 2 0.8 0.75 50 0.5563115 0.1918094
## 0.4 2 0.8 0.75 100 0.5671556 0.2152489
## 0.4 2 0.8 0.75 150 0.5649573 0.2131570
## 0.4 2 0.8 1.00 50 0.5670101 0.2060126
## 0.4 2 0.8 1.00 100 0.5846902 0.2403611
## 0.4 2 0.8 1.00 150 0.5826129 0.2415605
## 0.4 3 0.6 0.50 50 0.5934103 0.2544464
## 0.4 3 0.6 0.50 100 0.5714795 0.2209056
## 0.4 3 0.6 0.50 150 0.5803196 0.2390760
## 0.4 3 0.6 0.75 50 0.5826368 0.2398459
## 0.4 3 0.6 0.75 100 0.5759457 0.2286999
## 0.4 3 0.6 0.75 150 0.5914286 0.2583358
## 0.4 3 0.6 1.00 50 0.5825434 0.2358005
## 0.4 3 0.6 1.00 100 0.5870574 0.2455986
## 0.4 3 0.6 1.00 150 0.5848357 0.2434322
## 0.4 3 0.8 0.50 50 0.5693773 0.2196675
## 0.4 3 0.8 0.50 100 0.5694256 0.2247098
## 0.4 3 0.8 0.50 150 0.5606339 0.2137666
## 0.4 3 0.8 0.75 50 0.5648123 0.1993573
## 0.4 3 0.8 0.75 100 0.5779753 0.2355511
## 0.4 3 0.8 0.75 150 0.5736035 0.2285789
## 0.4 3 0.8 1.00 50 0.5870584 0.2364816
## 0.4 3 0.8 1.00 100 0.5671795 0.2040362
## 0.4 3 0.8 1.00 150 0.5803907 0.2318105
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter
## 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma =
## 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.5677591
FeatEval_Freq_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Freq_mean_accuracy_cv_xgb)
## [1] 0.5677591
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Freq_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Freq_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Freq_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Freq_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 38 6 15
## Dementia 1 5 1
## MCI 27 17 83
##
## Overall Statistics
##
## Accuracy : 0.6528
## 95% CI : (0.5811, 0.7198)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 5.951e-05
##
## Kappa : 0.3719
##
## Mcnemar's Test P-Value : 9.466e-05
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.5758 0.17857 0.8384
## Specificity 0.8346 0.98788 0.5319
## Pos Pred Value 0.6441 0.71429 0.6535
## Neg Pred Value 0.7910 0.87634 0.7576
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.1969 0.02591 0.4301
## Detection Prevalence 0.3057 0.03627 0.6580
## Balanced Accuracy 0.7052 0.58323 0.6851
cm_FeatEval_Freq_xgb_Accuracy <-cm_FeatEval_Freq_xgb$overall["Accuracy"]
cm_FeatEval_Freq_xgb_Kappa <-cm_FeatEval_Freq_xgb$overall["Kappa"]
print(cm_FeatEval_Freq_xgb_Accuracy)
## Accuracy
## 0.6528497
print(cm_FeatEval_Freq_xgb_Kappa)
## Kappa
## 0.3718547
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 155)
##
## Overall
## age.now 100.00
## cg08857872 90.21
## cg00962106 86.79
## cg14564293 80.36
## cg15501526 79.95
## cg02225060 78.88
## cg00154902 77.96
## cg02621446 69.80
## cg25259265 63.43
## cg05096415 61.27
## cg01013522 59.75
## cg16771215 58.96
## cg05234269 58.59
## cg02981548 55.91
## cg26948066 55.78
## cg00696044 55.74
## cg04248279 54.60
## cg17186592 53.42
## cg01933473 50.41
## cg01153376 49.92
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: age.now 0.0206450994 0.0258977236 0.012645422 0.0206450994
## 2: cg08857872 0.0186438510 0.0142802565 0.010116338 0.0186438510
## 3: cg00962106 0.0179437506 0.0144742747 0.017197774 0.0179437506
## 4: cg14564293 0.0166285957 0.0163139863 0.009104704 0.0166285957
## 5: cg15501526 0.0165441027 0.0120930070 0.008598887 0.0165441027
## ---
## 151: cg20507276 0.0011166602 0.0026583831 0.005058169 0.0011166602
## 152: cg27577781 0.0010477016 0.0024262331 0.003540718 0.0010477016
## 153: cg10750306 0.0009064883 0.0014152294 0.004046535 0.0009064883
## 154: cg13080267 0.0008719851 0.0012592102 0.003540718 0.0008719851
## 155: cg04664583 0.0001936603 0.0006596944 0.002529084 0.0001936603
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.7814
## The AUC value for class CN is: 0.7814364
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.687
## The AUC value for class Dementia is: 0.687013
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.7393
## The AUC value for class MCI is: 0.739308
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_xgb_AUC <- mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.7359191
print(FeatEval_Freq_xgb_AUC)
## [1] 0.7359191
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 365, 363, 364, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.5253246 0.02762106
## 78 0.5560471 0.13321329
## 155 0.5604916 0.14054833
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 155.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.5472878
FeatEval_Freq_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Freq_mean_accuracy_cv_rf)
## [1] 0.5472878
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Freq_rf_trainAccuracy<-train_accuracy
print(FeatEval_Freq_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Freq_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Freq_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 17 7 10
## Dementia 0 0 0
## MCI 49 21 89
##
## Overall Statistics
##
## Accuracy : 0.5492
## 95% CI : (0.4761, 0.6208)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 0.1747
##
## Kappa : 0.1284
##
## Mcnemar's Test P-Value : 1.25e-11
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.25758 0.0000 0.8990
## Specificity 0.86614 1.0000 0.2553
## Pos Pred Value 0.50000 NaN 0.5597
## Neg Pred Value 0.69182 0.8549 0.7059
## Prevalence 0.34197 0.1451 0.5130
## Detection Rate 0.08808 0.0000 0.4611
## Detection Prevalence 0.17617 0.0000 0.8238
## Balanced Accuracy 0.56186 0.5000 0.5772
cm_FeatEval_Freq_rf_Accuracy<-cm_FeatEval_Freq_rf$overall["Accuracy"]
print(cm_FeatEval_Freq_rf_Accuracy)
## Accuracy
## 0.5492228
cm_FeatEval_Freq_rf_Kappa<-cm_FeatEval_Freq_rf$overall["Kappa"]
print(cm_FeatEval_Freq_rf_Kappa)
## Kappa
## 0.1283742
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## variables are sorted by maximum importance across the classes
## only 20 most important variables shown (out of 155)
##
## CN Dementia MCI
## cg15501526 76.56 12.67 100.00
## age.now 47.68 49.97 65.92
## cg08857872 28.33 35.28 60.89
## cg01153376 19.27 31.95 59.45
## cg04412904 47.79 29.66 30.59
## cg11331837 28.17 35.84 47.43
## cg12279734 25.04 47.29 24.29
## cg23658987 46.86 24.79 27.31
## cg10240127 45.23 13.20 26.12
## cg27086157 29.79 11.55 44.90
## cg02621446 44.82 29.81 42.28
## cg00154902 22.94 43.98 41.47
## cg24506579 25.43 43.76 19.06
## cg00689685 35.66 41.97 19.30
## cg08198851 34.44 13.61 41.38
## cg10738648 40.75 21.39 34.86
## cg03129555 40.53 21.07 13.68
## cg02320265 11.83 33.98 40.30
## cg12228670 39.21 16.46 40.15
## cg25259265 30.86 36.43 39.91
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if( METHOD_FEATURE_FLAG==4||METHOD_FEATURE_FLAG==6){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if( METHOD_FEATURE_FLAG==3){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
## CN Dementia MCI Feature MaxImportance
## 1 76.557522 12.671541 100.000000 cg15501526 100.00000
## 2 47.675219 49.970307 65.921767 age.now 65.92177
## 3 28.334482 35.277523 60.891135 cg08857872 60.89113
## 4 19.274188 31.947251 59.452175 cg01153376 59.45217
## 5 47.790469 29.664061 30.586808 cg04412904 47.79047
## 6 28.165143 35.837651 47.429820 cg11331837 47.42982
## 7 25.038881 47.287396 24.289313 cg12279734 47.28740
## 8 46.860287 24.788670 27.314640 cg23658987 46.86029
## 9 45.232383 13.199544 26.119442 cg10240127 45.23238
## 10 29.794620 11.547286 44.896868 cg27086157 44.89687
## 11 44.816028 29.809458 42.279680 cg02621446 44.81603
## 12 22.935623 43.981809 41.465775 cg00154902 43.98181
## 13 25.428106 43.759595 19.056274 cg24506579 43.75960
## 14 35.657086 41.969176 19.300855 cg00689685 41.96918
## 15 34.444474 13.611592 41.381808 cg08198851 41.38181
## 16 40.753056 21.386297 34.859047 cg10738648 40.75306
## 17 40.534123 21.074843 13.676503 cg03129555 40.53412
## 18 11.829463 33.976102 40.295951 cg02320265 40.29595
## 19 39.206322 16.461197 40.148590 cg12228670 40.14859
## 20 30.855437 36.426905 39.906142 cg25259265 39.90614
## 21 18.800224 22.642347 39.562003 cg14710850 39.56200
## 22 26.927646 27.912295 39.310410 cg01933473 39.31041
## 23 28.027166 13.440389 38.727770 cg12466610 38.72777
## 24 23.150847 38.719307 28.005562 cg19512141 38.71931
## 25 34.098424 6.404334 38.633630 cg01921484 38.63363
## 26 2.321973 38.581729 13.038975 cg14307563 38.58173
## 27 38.466357 27.599100 28.386039 cg24861747 38.46636
## 28 29.832180 30.004096 38.465164 cg00962106 38.46516
## 29 20.868241 32.188190 38.200852 cg23916408 38.20085
## 30 38.199229 35.109378 38.010718 cg02225060 38.19923
## 31 32.813248 29.587642 37.770671 cg00616572 37.77067
## 32 26.756275 18.379489 37.624229 cg16652920 37.62423
## 33 8.558735 22.036059 37.485375 PC3 37.48538
## 34 25.191758 16.064688 37.199630 cg06715136 37.19963
## 35 9.905268 37.006968 22.922249 cg12284872 37.00697
## 36 15.665099 25.743729 36.962434 cg05096415 36.96243
## 37 23.716251 36.829351 26.997109 cg18821122 36.82935
## 38 30.366998 36.804164 13.954492 cg19503462 36.80416
## 39 31.885237 24.914528 36.664723 cg15865722 36.66472
## 40 23.312543 27.609555 36.510004 cg07028768 36.51000
## 41 8.134870 28.139986 36.484573 cg17906851 36.48457
## 42 36.459120 34.507209 31.464193 cg00999469 36.45912
## 43 28.738331 25.224942 36.200164 cg12146221 36.20016
## 44 7.749586 36.143999 13.522490 cg06697310 36.14400
## 45 36.019111 20.194962 32.815123 cg14293999 36.01911
## 46 25.127638 35.585410 34.654922 cg02494911 35.58541
## 47 33.110587 30.573956 35.307728 cg01667144 35.30773
## 48 35.281334 33.110445 29.224294 cg26069044 35.28133
## 49 23.499631 28.306483 34.908205 cg10750306 34.90820
## 50 21.957287 23.375092 34.851217 cg23432430 34.85122
## 51 34.709456 33.075014 13.472489 cg14564293 34.70946
## 52 29.164687 18.215493 34.574848 cg24873924 34.57485
## 53 23.997078 26.213829 34.437667 cg26948066 34.43767
## 54 18.505626 28.989207 34.349722 cg20139683 34.34972
## 55 34.301912 22.967946 9.104703 cg03084184 34.30191
## 56 31.138920 34.231962 27.764958 cg05321907 34.23196
## 57 5.548237 34.122444 33.127313 cg27341708 34.12244
## 58 27.036024 34.096317 33.376387 cg06112204 34.09632
## 59 21.011794 34.040792 15.561797 cg13885788 34.04079
## 60 11.227817 34.009731 17.268730 cg00247094 34.00973
## 61 33.982197 25.390858 31.641726 cg12776173 33.98220
## 62 26.030267 32.529626 33.893198 cg02932958 33.89320
## 63 33.840561 15.297753 30.966868 cg03660162 33.84056
## 64 30.260520 31.239948 33.696576 cg06378561 33.69658
## 65 27.187461 33.624404 30.548059 cg05841700 33.62440
## 66 33.585063 26.814741 16.392102 cg01680303 33.58506
## 67 26.426384 17.826922 33.540052 cg12012426 33.54005
## 68 33.356186 21.491224 16.797766 cg11247378 33.35619
## 69 30.685059 33.265336 26.224587 cg12682323 33.26534
## 70 26.239028 33.210415 15.029442 cg06536614 33.21041
## 71 29.680946 33.107219 28.301134 cg02372404 33.10722
## 72 18.245164 12.269336 32.985202 cg12738248 32.98520
## 73 27.469747 32.951864 29.995839 cg01413796 32.95186
## 74 27.558199 32.904776 13.894549 cg27577781 32.90478
## 75 32.881946 20.071800 21.381645 cg16771215 32.88195
## 76 32.861097 20.964997 20.609993 cg02356645 32.86110
## 77 16.306166 16.125543 32.799640 PC2 32.79964
## 78 20.659967 32.531707 19.021606 cg18819889 32.53171
## 79 23.801364 23.299663 32.457994 cg00322003 32.45799
## 80 14.174655 32.385627 15.260737 cg14527649 32.38563
## 81 22.707934 24.855162 32.171025 cg27272246 32.17103
## 82 13.171397 32.143854 23.934534 cg16749614 32.14385
## 83 30.085994 14.457693 32.140575 cg16788319 32.14057
## 84 32.127299 11.985407 28.995139 cg10369879 32.12730
## 85 23.898218 18.617903 32.109373 cg10985055 32.10937
## 86 18.519748 28.788120 32.102136 PC1 32.10214
## 87 20.602486 31.893799 22.148440 cg26474732 31.89380
## 88 23.252282 20.310270 31.798601 cg04248279 31.79860
## 89 15.645024 16.207616 31.662536 cg25879395 31.66254
## 90 16.935190 31.553067 23.395061 cg00675157 31.55307
## 91 31.453159 14.451961 17.444123 cg11438323 31.45316
## 92 23.553026 27.943710 31.431104 cg12784167 31.43110
## 93 31.422718 26.545908 19.831244 cg03088219 31.42272
## 94 19.151605 31.397392 13.684714 cg25561557 31.39739
## 95 22.831593 31.354120 25.575486 cg26757229 31.35412
## 96 31.335151 16.982945 12.502967 cg01013522 31.33515
## 97 29.681319 31.297956 24.964984 cg12534577 31.29796
## 98 30.967252 22.276823 30.641848 cg15535896 30.96725
## 99 30.249127 30.788217 25.551552 cg21209485 30.78822
## 100 14.521028 23.525064 30.692510 cg25758034 30.69251
## 101 30.587745 29.449598 27.017735 cg02981548 30.58775
## 102 29.978427 8.324429 27.918327 cg20685672 29.97843
## 103 28.019829 17.529064 29.975951 cg07138269 29.97595
## 104 19.362432 29.800373 27.134778 cg00272795 29.80037
## 105 12.463063 22.450755 29.651498 cg03071582 29.65150
## 106 9.639006 25.747210 29.304954 cg06950937 29.30495
## 107 29.287502 9.277803 15.968740 cg07523188 29.28750
## 108 10.471845 29.219275 24.146277 cg17186592 29.21928
## 109 29.062834 26.923298 17.453515 cg08584917 29.06283
## 110 29.008156 26.881553 24.833864 cg15775217 29.00816
## 111 28.898041 16.542972 23.500265 cg11187460 28.89804
## 112 28.498243 19.588062 20.558468 cg13080267 28.49824
## 113 14.736947 21.014955 28.393912 cg00696044 28.39391
## 114 23.053910 22.307916 28.382166 cg03982462 28.38217
## 115 28.329989 21.987090 25.804974 cg14240646 28.32999
## 116 28.325767 16.993197 19.351713 cg03327352 28.32577
## 117 14.287716 28.315537 14.325393 cg24851651 28.31554
## 118 26.820839 28.231727 20.523253 cg20370184 28.23173
## 119 18.581261 27.901487 23.766773 cg15633912 27.90149
## 120 18.656208 18.177485 27.776474 cg27639199 27.77647
## 121 27.700469 12.118715 21.581429 cg08779649 27.70047
## 122 22.632023 27.608406 18.684827 cg07152869 27.60841
## 123 27.583688 18.351688 23.958175 cg17421046 27.58369
## 124 27.502771 16.371358 25.903356 cg01128042 27.50277
## 125 27.438427 13.564315 23.800666 cg09584650 27.43843
## 126 19.321020 27.252073 17.255848 cg14924512 27.25207
## 127 19.916538 18.011699 27.102551 cg05234269 27.10255
## 128 25.160604 26.955728 27.071172 cg19377607 27.07117
## 129 24.824570 23.641732 26.942593 cg11133939 26.94259
## 130 26.077300 12.871914 26.887496 cg05570109 26.88750
## 131 26.821967 21.020844 25.702543 cg24859648 26.82197
## 132 13.518090 14.804948 26.803240 cg11227702 26.80324
## 133 26.609214 18.554915 20.623618 cg04664583 26.60921
## 134 23.420930 10.009226 26.429006 cg21854924 26.42901
## 135 19.925757 26.274847 23.587711 cg20913114 26.27485
## 136 12.953184 15.432140 26.213690 cg17738613 26.21369
## 137 26.107548 19.892290 22.551946 cg09854620 26.10755
## 138 26.020994 24.606325 15.409204 cg25436480 26.02099
## 139 16.236454 21.542269 25.879241 cg08861434 25.87924
## 140 11.775460 21.020269 25.754330 cg21697769 25.75433
## 141 22.790673 12.087909 25.517484 cg20678988 25.51748
## 142 20.719592 25.424682 21.273081 cg27452255 25.42468
## 143 21.655604 25.406488 20.002807 cg20507276 25.40649
## 144 19.329855 25.024138 22.288192 cg23161429 25.02414
## 145 24.369803 24.606859 20.308675 cg01549082 24.60686
## 146 16.323001 22.498629 24.606185 cg16579946 24.60619
## 147 24.367117 22.586984 21.086705 cg22274273 24.36712
## 148 14.876059 23.719784 24.335876 cg06864789 24.33588
## 149 23.991716 15.416105 0.000000 cg18339359 23.99172
## 150 23.097285 23.910924 17.064792 cg26219488 23.91092
## 151 21.156788 23.766333 19.172032 cg16178271 23.76633
## 152 5.224382 21.759585 11.946569 cg07480176 21.75959
## 153 12.977742 19.689652 19.476222 cg16715186 19.68965
## 154 14.523868 15.847457 18.882385 cg06118351 18.88238
## 155 15.989563 9.329725 14.835184 cg17429539 15.98956
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
## CN Dementia MCI Feature MaxImportance
## 1 76.55752 12.67154 100.00000 cg15501526 100.00000
## 2 47.67522 49.97031 65.92177 age.now 65.92177
## 3 28.33448 35.27752 60.89113 cg08857872 60.89113
## 4 19.27419 31.94725 59.45217 cg01153376 59.45217
## 5 47.79047 29.66406 30.58681 cg04412904 47.79047
## 6 28.16514 35.83765 47.42982 cg11331837 47.42982
## 7 25.03888 47.28740 24.28931 cg12279734 47.28740
## 8 46.86029 24.78867 27.31464 cg23658987 46.86029
## 9 45.23238 13.19954 26.11944 cg10240127 45.23238
## 10 29.79462 11.54729 44.89687 cg27086157 44.89687
## 11 44.81603 29.80946 42.27968 cg02621446 44.81603
## 12 22.93562 43.98181 41.46577 cg00154902 43.98181
## 13 25.42811 43.75960 19.05627 cg24506579 43.75960
## 14 35.65709 41.96918 19.30086 cg00689685 41.96918
## 15 34.44447 13.61159 41.38181 cg08198851 41.38181
## 16 40.75306 21.38630 34.85905 cg10738648 40.75306
## 17 40.53412 21.07484 13.67650 cg03129555 40.53412
## 18 11.82946 33.97610 40.29595 cg02320265 40.29595
## 19 39.20632 16.46120 40.14859 cg12228670 40.14859
## 20 30.85544 36.42690 39.90614 cg25259265 39.90614
## [1] "the top 20 features based on max way:"
## [1] "cg15501526" "age.now" "cg08857872" "cg01153376" "cg04412904" "cg11331837" "cg12279734"
## [8] "cg23658987" "cg10240127" "cg27086157" "cg02621446" "cg00154902" "cg24506579" "cg00689685"
## [15] "cg08198851" "cg10738648" "cg03129555" "cg02320265" "cg12228670" "cg25259265"
## Warning: The melt generic in data.table has been passed a data.frame and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no
## longer actively developed, and this redirection is now deprecated. To continue using melt
## methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the
## namespace, i.e. reshape2::melt(.). In the next version, this warning will become an error.
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.6859
## The AUC value for class CN is: 0.6858745
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.6633
## The AUC value for class Dementia is: 0.6633117
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.6451
## The AUC value for class MCI is: 0.6450677
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_rf_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.6647513
print(FeatEval_Freq_rf_AUC)
## [1] 0.6647513
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 455 samples
## 155 predictors
## 3 classes: 'CN', 'Dementia', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 364, 364, 363, 364, 365
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.6922456 0.4922408
## 0.50 0.7010368 0.5080797
## 1.00 0.7054080 0.5068655
##
## Tuning parameter 'sigma' was held constant at a value of 0.003349724
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.003349724 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.003349724 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.6995634
FeatEval_Freq_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Freq_mean_accuracy_cv_svm)
## [1] 0.6995634
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.958241758241758"
FeatEval_Freq_svm_trainAccuracy <- train_accuracy
print(FeatEval_Freq_svm_trainAccuracy)
## [1] 0.9582418
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Freq_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Freq_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN Dementia MCI
## CN 42 3 14
## Dementia 5 18 4
## MCI 19 7 81
##
## Overall Statistics
##
## Accuracy : 0.7306
## 95% CI : (0.6621, 0.7918)
## No Information Rate : 0.513
## P-Value [Acc > NIR] : 5.406e-10
##
## Kappa : 0.5439
##
## Mcnemar's Test P-Value : 0.5568
##
## Statistics by Class:
##
## Class: CN Class: Dementia Class: MCI
## Sensitivity 0.6364 0.64286 0.8182
## Specificity 0.8661 0.94545 0.7234
## Pos Pred Value 0.7119 0.66667 0.7570
## Neg Pred Value 0.8209 0.93976 0.7907
## Prevalence 0.3420 0.14508 0.5130
## Detection Rate 0.2176 0.09326 0.4197
## Detection Prevalence 0.3057 0.13990 0.5544
## Balanced Accuracy 0.7513 0.79416 0.7708
cm_FeatEval_Freq_svm_Accuracy <- cm_FeatEval_Freq_svm$overall["Accuracy"]
cm_FeatEval_Freq_svm_Kappa <- cm_FeatEval_Freq_svm$overall["Kappa"]
print(cm_FeatEval_Freq_svm_Accuracy)
## Accuracy
## 0.7305699
print(cm_FeatEval_Freq_svm_Kappa)
## Kappa
## 0.5439426
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 156 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg08861434 1.090141 1.140845 1.225352 0.1250000
## 2 cg10240127 1.087324 1.112676 1.112676 0.1219136
## 3 cg16579946 1.047887 1.112676 1.112676 0.1219136
## 4 cg25879395 1.084507 1.112676 1.126761 0.1219136
## 5 cg02225060 1.061972 1.098592 1.138028 0.1203704
## 6 cg00962106 1.000000 1.084507 1.095775 0.1188272
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Class: CN
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 127 controls (binary_labels 0) < 66 cases (binary_labels 1).
## Area under the curve: 0.5208
## The AUC value for class CN is: 0.5207588
##
## Class: Dementia
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 165 controls (binary_labels 0) < 28 cases (binary_labels 1).
## Area under the curve: 0.532
## The AUC value for class Dementia is: 0.5320346
##
## Class: MCI
##
## Call:
## roc.default(response = binary_labels, predictor = prob_predictions[, class])
##
## Data: prob_predictions[, class] in 94 controls (binary_labels 0) < 99 cases (binary_labels 1).
## Area under the curve: 0.5214
## The AUC value for class MCI is: 0.5213841
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_svm_AUC<-mean_auc
}
## The mean AUC value across all classes with one versus rest method is: 0.5247258
print(FeatEval_Freq_svm_AUC)
## [1] 0.5247258
In the INPUT Session, “Metrics_Table_Output_FLAG” : This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics
Feature_and_model_Metrics <- c("Training Accuracy", "Test Accuracy", "Test Kappa", "AUC", "Average Test Accuracy during Cross Validation")
ModelTrain_stage_Logistic_metrics_ModelTrainStage <- c(modelTrain_LRM1_trainAccuracy, cm_modelTrain_LRM1_Accuracy, cm_modelTrain_LRM1_Kappa,modelTrain_LRM1_AUC, modelTrain_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Mean<-c(FeatEval_Mean_LRM1_trainAccuracy,
cm_FeatEval_Mean_LRM1_Accuracy,cm_FeatEval_Mean_LRM1_Kappa,FeatEval_Mean_LRM1_AUC, FeatEval_Mean_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Median<-c(FeatEval_Median_LRM1_trainAccuracy,
cm_FeatEval_Median_LRM1_Accuracy,cm_FeatEval_Median_LRM1_Kappa,FeatEval_Median_LRM1_AUC, FeatEval_Median_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Freq<-c(FeatEval_Freq_LRM1_trainAccuracy,
cm_FeatEval_Freq_LRM1_Accuracy,cm_FeatEval_Freq_LRM1_Kappa,FeatEval_Freq_LRM1_AUC,FeatEval_Freq_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics<-c(ModelTrain_stage_Logistic_metrics_ModelTrainStage, ModelTrain_stage_Logistic_metrics_Feature_Mean,ModelTrain_stage_Logistic_metrics_Feature_Median,ModelTrain_stage_Logistic_metrics_Feature_Freq)
ModelTrain_stage_ElasticNet_metrics_ModelTrainStage <- c(modelTrain_ENM1_trainAccuracy, cm_modelTrain_ENM1_Accuracy, cm_modelTrain_ENM1_Kappa,modelTrain_ENM1_AUC, modelTrain_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Mean<-c(FeatEval_Mean_ENM1_trainAccuracy,
cm_FeatEval_Mean_ENM1_Accuracy,cm_FeatEval_Mean_ENM1_Kappa,FeatEval_Mean_ENM1_AUC, FeatEval_Mean_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Median<-c(FeatEval_Median_ENM1_trainAccuracy,
cm_FeatEval_Median_ENM1_Accuracy,cm_FeatEval_Median_ENM1_Kappa,FeatEval_Median_ENM1_AUC, FeatEval_Median_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Freq<-c(FeatEval_Freq_ENM1_trainAccuracy,
cm_FeatEval_Freq_ENM1_Accuracy,cm_FeatEval_Freq_ENM1_Kappa,FeatEval_Freq_ENM1_AUC,FeatEval_Freq_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics<-c(ModelTrain_stage_ElasticNet_metrics_ModelTrainStage, ModelTrain_stage_ElasticNet_metrics_Feature_Mean,ModelTrain_stage_ElasticNet_metrics_Feature_Median,ModelTrain_stage_ElasticNet_metrics_Feature_Freq)
ModelTrain_stage_XGBoost_metrics_ModelTrainStage <- c(modelTrain_xgb_trainAccuracy, cm_modelTrain_xgb_Accuracy, cm_modelTrain_xgb_Kappa,modelTrain_xgb_AUC, modelTrain_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Mean<-c(FeatEval_Mean_xgb_trainAccuracy,
cm_FeatEval_Mean_xgb_Accuracy,cm_FeatEval_Mean_xgb_Kappa,FeatEval_Mean_xgb_AUC, FeatEval_Mean_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Median<-c(FeatEval_Median_xgb_trainAccuracy,
cm_FeatEval_Median_xgb_Accuracy,cm_FeatEval_Median_xgb_Kappa,FeatEval_Median_xgb_AUC, FeatEval_Median_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Freq<-c(FeatEval_Freq_xgb_trainAccuracy,
cm_FeatEval_Freq_xgb_Accuracy,cm_FeatEval_Freq_xgb_Kappa,FeatEval_Freq_xgb_AUC,FeatEval_Freq_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics<-c(ModelTrain_stage_XGBoost_metrics_ModelTrainStage, ModelTrain_stage_XGBoost_metrics_Feature_Mean,ModelTrain_stage_XGBoost_metrics_Feature_Median,ModelTrain_stage_XGBoost_metrics_Feature_Freq)
ModelTrain_stage_RandomForest_metrics_ModelTrainStage <- c(modelTrain_rf_trainAccuracy, cm_modelTrain_rf_Accuracy, cm_modelTrain_rf_Kappa,modelTrain_rf_AUC, modelTrain_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Mean<-c(FeatEval_Mean_rf_trainAccuracy,
cm_FeatEval_Mean_rf_Accuracy,cm_FeatEval_Mean_rf_Kappa,FeatEval_Mean_rf_AUC, FeatEval_Mean_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Median<-c(FeatEval_Median_rf_trainAccuracy,
cm_FeatEval_Median_rf_Accuracy,cm_FeatEval_Median_rf_Kappa,FeatEval_Median_rf_AUC, FeatEval_Median_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Freq<-c(FeatEval_Freq_rf_trainAccuracy,
cm_FeatEval_Freq_rf_Accuracy,cm_FeatEval_Freq_rf_Kappa,FeatEval_Freq_rf_AUC,FeatEval_Freq_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics<-c(ModelTrain_stage_RandomForest_metrics_ModelTrainStage, ModelTrain_stage_RandomForest_metrics_Feature_Mean,ModelTrain_stage_RandomForest_metrics_Feature_Median,ModelTrain_stage_RandomForest_metrics_Feature_Freq)
ModelTrain_stage_SVM_metrics_ModelTrainStage <- c(modelTrain_svm_trainAccuracy, cm_modelTrain_svm_Accuracy, cm_modelTrain_svm_Kappa,modelTrain_svm_AUC, modelTrain_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Mean<-c(FeatEval_Mean_svm_trainAccuracy,
cm_FeatEval_Mean_svm_Accuracy,cm_FeatEval_Mean_svm_Kappa,FeatEval_Mean_svm_AUC, FeatEval_Mean_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Median<-c(FeatEval_Median_svm_trainAccuracy,
cm_FeatEval_Median_svm_Accuracy,cm_FeatEval_Median_svm_Kappa,FeatEval_Median_svm_AUC, FeatEval_Median_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Freq<-c(FeatEval_Freq_svm_trainAccuracy,
cm_FeatEval_Freq_svm_Accuracy,cm_FeatEval_Freq_svm_Kappa,FeatEval_Freq_svm_AUC,FeatEval_Freq_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics<-c(ModelTrain_stage_SVM_metrics_ModelTrainStage, ModelTrain_stage_SVM_metrics_Feature_Mean,ModelTrain_stage_SVM_metrics_Feature_Median,ModelTrain_stage_SVM_metrics_Feature_Freq)
if(METHOD_FEATURE_FLAG==1){
classifcationType = "Multiclass"
}
if(METHOD_FEATURE_FLAG==2){
classifcationType = "Multiclass and use PCA"
}
if(METHOD_FEATURE_FLAG==3){
classifcationType = "Binary"
}
if(METHOD_FEATURE_FLAG==4){
classifcationType = "CN vs Dementia (AD)"
}
if(METHOD_FEATURE_FLAG==5){
classifcationType = "CN vs MCI"
}
if(METHOD_FEATURE_FLAG==6){
classifcationType = "MCI vs Dementia"
}
Metrics_results_df <- data.frame()
library(dplyr)
Metrics_results_df <- data.frame(
`Number_of_CpG_used` = rep(Number_N_TopNCpGs, 20),
`Number_of_Phenotype_Features_Used` = rep(5, 20),
`Total_Number_of_features_before_Preprocessing` = rep(Number_N_TopNCpGs+5, 20),
`Number_of_features_after_processing` = rep(Num_feaForProcess, 20),
`Classification_Type` = rep(classifcationType, 20),
`Number_of_Key_features_Selected_(Mean,Median)` = rep(INPUT_NUMBER_FEATURES, 20),
`Number_of_Key_features_remained_based_on_frequency_methods` = rep(Num_KeyFea_Frequency, 20),
`Metrics_Stage` = c(rep("Model Train Stage",5),rep("Key Feature Evaluation (Select based on Mean) ",5),rep("Key Feature Evaluation (Select based on Median) ",5),rep("Key Feature Evaluation (Select based on Frequency) ",5)),
`Metric` = rep(Feature_and_model_Metrics, 4),
`Logistic_regression` = c(ModelTrain_stage_Logistic_metrics),
`Elastic_Net` = c(ModelTrain_stage_ElasticNet_metrics),
`XGBoost` = c(ModelTrain_stage_XGBoost_metrics),
`Random_Forest` = c(ModelTrain_stage_RandomForest_metrics),
`SVM` = c(ModelTrain_stage_SVM_metrics)
)
print(Metrics_results_df)
## Number_of_CpG_used Number_of_Phenotype_Features_Used
## 1 5000 5
## 2 5000 5
## 3 5000 5
## 4 5000 5
## 5 5000 5
## 6 5000 5
## 7 5000 5
## 8 5000 5
## 9 5000 5
## 10 5000 5
## 11 5000 5
## 12 5000 5
## 13 5000 5
## 14 5000 5
## 15 5000 5
## 16 5000 5
## 17 5000 5
## 18 5000 5
## 19 5000 5
## 20 5000 5
## Total_Number_of_features_before_Preprocessing Number_of_features_after_processing
## 1 5005 155
## 2 5005 155
## 3 5005 155
## 4 5005 155
## 5 5005 155
## 6 5005 155
## 7 5005 155
## 8 5005 155
## 9 5005 155
## 10 5005 155
## 11 5005 155
## 12 5005 155
## 13 5005 155
## 14 5005 155
## 15 5005 155
## 16 5005 155
## 17 5005 155
## 18 5005 155
## 19 5005 155
## 20 5005 155
## Classification_Type Number_of_Key_features_Selected_.Mean.Median.
## 1 Multiclass 250
## 2 Multiclass 250
## 3 Multiclass 250
## 4 Multiclass 250
## 5 Multiclass 250
## 6 Multiclass 250
## 7 Multiclass 250
## 8 Multiclass 250
## 9 Multiclass 250
## 10 Multiclass 250
## 11 Multiclass 250
## 12 Multiclass 250
## 13 Multiclass 250
## 14 Multiclass 250
## 15 Multiclass 250
## 16 Multiclass 250
## 17 Multiclass 250
## 18 Multiclass 250
## 19 Multiclass 250
## 20 Multiclass 250
## Number_of_Key_features_remained_based_on_frequency_methods
## 1 155
## 2 155
## 3 155
## 4 155
## 5 155
## 6 155
## 7 155
## 8 155
## 9 155
## 10 155
## 11 155
## 12 155
## 13 155
## 14 155
## 15 155
## 16 155
## 17 155
## 18 155
## 19 155
## 20 155
## Metrics_Stage
## 1 Model Train Stage
## 2 Model Train Stage
## 3 Model Train Stage
## 4 Model Train Stage
## 5 Model Train Stage
## 6 Key Feature Evaluation (Select based on Mean)
## 7 Key Feature Evaluation (Select based on Mean)
## 8 Key Feature Evaluation (Select based on Mean)
## 9 Key Feature Evaluation (Select based on Mean)
## 10 Key Feature Evaluation (Select based on Mean)
## 11 Key Feature Evaluation (Select based on Median)
## 12 Key Feature Evaluation (Select based on Median)
## 13 Key Feature Evaluation (Select based on Median)
## 14 Key Feature Evaluation (Select based on Median)
## 15 Key Feature Evaluation (Select based on Median)
## 16 Key Feature Evaluation (Select based on Frequency)
## 17 Key Feature Evaluation (Select based on Frequency)
## 18 Key Feature Evaluation (Select based on Frequency)
## 19 Key Feature Evaluation (Select based on Frequency)
## 20 Key Feature Evaluation (Select based on Frequency)
## Metric Logistic_regression Elastic_Net XGBoost
## 1 Training Accuracy 0.9604396 0.8637363 1.0000000
## 2 Test Accuracy 0.7098446 0.7202073 0.5854922
## 3 Test Kappa 0.4987013 0.4986772 0.2510671
## 4 AUC 0.8328700 0.8566272 0.7357863
## 5 Average Test Accuracy during Cross Validation 0.6331631 0.5868408 0.5686429
## 6 Training Accuracy 0.9604396 0.8637363 1.0000000
## 7 Test Accuracy 0.7098446 0.7202073 0.6373057
## 8 Test Kappa 0.4987013 0.4986772 0.3435693
## 9 AUC 0.8329421 0.8566272 0.6960104
## 10 Average Test Accuracy during Cross Validation 0.6326693 0.5868952 0.5663579
## 11 Training Accuracy 0.9604396 0.8637363 1.0000000
## 12 Test Accuracy 0.7098446 0.7202073 0.6269430
## 13 Test Kappa 0.4987013 0.4986772 0.3309903
## 14 AUC 0.8329058 0.8566272 0.6975005
## 15 Average Test Accuracy during Cross Validation 0.6326693 0.5868408 0.5662996
## 16 Training Accuracy 0.9604396 0.8637363 1.0000000
## 17 Test Accuracy 0.7098446 0.7202073 0.6528497
## 18 Test Kappa 0.4987013 0.4986772 0.3718547
## 19 AUC 0.8328700 0.8566272 0.7359191
## 20 Average Test Accuracy during Cross Validation 0.6329108 0.5868408 0.5677591
## Random_Forest SVM
## 1 1.0000000 0.9384615
## 2 0.5699482 0.6632124
## 3 0.1684489 0.4562673
## 4 0.6475618 0.5617548
## 5 0.5473039 0.7135069
## 6 1.0000000 0.9516484
## 7 0.5647668 0.6839378
## 8 0.1467368 0.4754734
## 9 0.6536684 0.5420210
## 10 0.5443487 0.7114723
## 11 1.0000000 0.9538462
## 12 0.5492228 0.7461140
## 13 0.1343060 0.5688625
## 14 0.6415256 0.5198441
## 15 0.5428752 0.6769049
## 16 1.0000000 0.9582418
## 17 0.5492228 0.7305699
## 18 0.1283742 0.5439426
## 19 0.6647513 0.5247258
## 20 0.5472878 0.6995634
Write out the data frame (Model Metrics) to csv file if FLAG_WRITE_METRICS_DF = TRUE
if(FLAG_WRITE_METRICS_DF){
write.csv(Metrics_results_df,OUTUT_PerformanceMetricsCSV_PATHNAME,row.names = FALSE)
print("Metrics Performance output path:")
print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## [1] "Metrics Performance output path:"
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method1_MultiClass\\Method1_MultiClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"
Phenotype Part Data frame : “phenoticPart_RAW”
RAW Merged Data frame : “merged_df_raw”
Processed Data, i.e data used for model train.
name for “processed_data” could be :
“processed_data_m1”, which uses method one to process the data
“processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.
“processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names, and will assigned to “processed_dataFrame”.
“processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
“processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
“processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
name for “AfterProcess_FeatureName” could be :
Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”
Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”
Feature Frequency / Common Data Frame:
“frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “INPUT_NUMBER_FEATURES”
“feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
Output data frame with selected features based on mean method: “df_selected_Mean”
, This data frame not have column named “SampleID”.
Output data frame with selected features based on median method: “df_selected_Median”, This data frame not have column named “SampleID”.
Output data frame with selected features based on frequency / common feature method: “df_process_Output_freq”, This data frame not have column named “SampleID”.
And the Feature names: “df_process_frequency_FeatureName”
“df_feature_Output_frequency” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “NUM_COMMON_FEATURES_SET_Frequency”
“Selected_Frequency_Feature_importance” This is importance value of selected features’ frequency ordered by Total count of frequency
“feature_output_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_Output_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
Number of CpG used: “Number_N_TopNCpGs”
Phenotype features selected:
Number of features before processing: (#Phenotype features selected) + (#CpGs Used)
Number of features after processing (DMP, data cleaning):“Num_feaForProcess”
Model performance (Variable names)- Model Training Stage:
| Initial Model Training Metric | Logistic regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | modelTrain_LRM1_trainAccuracy | modelTrain_ENM1_trainAccuracy | modelTrain_xgb_trainAccuracy | modelTrain_rf_trainAccuracy | modelTrain_svm_trainAccuracy |
| Test Accuracy | cm_modelTrain_LRM1_Accuracy | cm_modelTrain_ENM1_Accuracy | cm_modelTrain_xgb_Accuracy | cm_modelTrain_rf_Accuracy | cm_modelTrain_svm_Accuracy |
| Test Kappa | cm_modelTrain_LRM1_Kappa | cm_modelTrain_ENM1_Kappa | cm_modelTrain_xgb_Kappa | cm_modelTrain_rf_Kappa | cm_modelTrain_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | modelTrain_LRM1_AUC | modelTrain_ENM1_AUC | modelTrain_xgb_AUC | modelTrain_rf_AUC | modelTrain_svm_AUC |
| Average Test Accuracy during Cross Validation | modelTrain_mean_accuracy_cv_LRM1 | modelTrain_mean_accuracy_cv_ENM1 | modelTrain_mean_accuracy_cv_xgb | modelTrain_mean_accuracy_cv_rf | modelTrain_mean_accuracy_cv_svm |
Number of Key features selected (Mean/Median Methods) : “INPUT_NUMBER_FEATURES”
Number of Key features remained based on frequency methods :
“Num_KeyFea_Frequency”
Performance of the set of key features (Selected under 3 methods):
Based on Mean:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Mean_LRM1_trainAccuracy | FeatEval_Mean_ENM1_trainAccuracy | FeatEval_Mean_xgb_trainAccuracy | FeatEval_Mean_rf_trainAccuracy | FeatEval_Mean_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Mean_LRM1_Accuracy | cm_FeatEval_Mean_ENM1_Accuracy | cm_FeatEval_Mean_xgb_Accuracy | cm_FeatEval_Mean_rf_Accuracy | cm_FeatEval_Mean_svm_Accuracy |
| Test Kappa | cm_FeatEval_Mean_LRM1_Kappa | cm_FeatEval_Mean_ENM1_Kappa | cm_FeatEval_Mean_xgb_Kappa | cm_FeatEval_Mean_rf_Kappa | cm_FeatEval_Mean_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Mean_LRM1_AUC | FeatEval_Mean_ENM1_AUC | FeatEval_Mean_xgb_AUC | FeatEval_Mean_rf_AUC | FeatEval_Mean_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Mean_mean_accuracy_cv_LRM1 | FeatEval_Mean_mean_accuracy_cv_ENM1 | FeatEval_Mean_mean_accuracy_cv_xgb | FeatEval_Mean_mean_accuracy_cv_rf | FeatEval_Mean_mean_accuracy_cv_svm |
Based on Median:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Median_LRM1_trainAccuracy | FeatEval_Median_ENM1_trainAccuracy | FeatEval_Median_xgb_trainAccuracy | FeatEval_Median_rf_trainAccuracy | FeatEval_Median_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Median_LRM1_Accuracy | cm_FeatEval_Median_ENM1_Accuracy | cm_FeatEval_Median_xgb_Accuracy | cm_FeatEval_Median_rf_Accuracy | cm_FeatEval_Median_svm_Accuracy |
| Test Kappa | cm_FeatEval_Median_LRM1_Kappa | cm_FeatEval_Median_ENM1_Kappa | cm_FeatEval_Median_xgb_Kappa | cm_FeatEval_Median_rf_Kappa | cm_FeatEval_Median_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Median_LRM1_AUC | FeatEval_Median_ENM1_AUC | FeatEval_Median_xgb_AUC | FeatEval_Median_rf_AUC | FeatEval_Median_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Median_mean_accuracy_cv_LRM1 | FeatEval_Median_mean_accuracy_cv_ENM1 | FeatEval_Median_mean_accuracy_cv_xgb | FeatEval_Median_mean_accuracy_cv_rf | FeatEval_Median_mean_accuracy_cv_svm |
Based on Frequency:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Freq_LRM1_trainAccuracy | FeatEval_Freq_ENM1_trainAccuracy | FeatEval_Freq_xgb_trainAccuracy | FeatEval_Freq_rf_trainAccuracy | FeatEval_Freq_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Freq_LRM1_Accuracy | cm_FeatEval_Freq_ENM1_Accuracy | cm_FeatEval_Freq_xgb_Accuracy | cm_FeatEval_Freq_rf_Accuracy | cm_FeatEval_Freq_svm_Accuracy |
| Test Kappa | cm_FeatEval_Freq_LRM1_Kappa | cm_FeatEval_Freq_ENM1_Kappa | cm_FeatEval_Freq_xgb_Kappa | cm_FeatEval_Freq_rf_Kappa | cm_FeatEval_Freq_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Freq_LRM1_AUC | FeatEval_Freq_ENM1_AUC | FeatEval_Freq_xgb_AUC | FeatEval_Freq_rf_AUC | FeatEval_Freq_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Freq_mean_accuracy_cv_LRM1 | FeatEval_Freq_mean_accuracy_cv_ENM1 | FeatEval_Freq_mean_accuracy_cv_xgb | FeatEval_Freq_mean_accuracy_cv_rf | FeatEval_Freq_mean_accuracy_cv_svm |